-
Notifications
You must be signed in to change notification settings - Fork 4
/
ibd2015_nkcell.Rmd
596 lines (485 loc) · 27.1 KB
/
ibd2015_nkcell.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
---
title: "RSS-NET analysis of IBD GWAS summary statistics and NK cell regulatory network"
author: "Xiang Zhu"
date: 2019-08-19
output: workflowr::wflow_html
editor_options:
chunk_output_type: console
---
[zenodo-ibd2015nkcell]: https://doi.org/10.5281/zenodo.3698240
[Zhu et al (2020)]: xxx
[Supplementary Notes]: xxx
[`ibd2015_nkcell/`]: https://github.com/SUwonglab/rss-net/tree/master/examples/ibd2015_nkcell
[`analysis_template.m`]: https://github.com/SUwonglab/rss-net/blob/master/examples/ibd2015_nkcell/analysis_template.m
[`ibd2015_nkcell.m`]: https://github.com/SUwonglab/rss-net/blob/master/examples/ibd2015_nkcell/ibd2015_nkcell.m
[`ibd2015_nkcell.sbatch`]: https://github.com/SUwonglab/rss-net/blob/master/examples/ibd2015_nkcell/ibd2015_nkcell.sbatch
[`summary_template.m`]: https://github.com/SUwonglab/rss-net/blob/master/examples/ibd2015_nkcell/summary_template.m
[`summarize_ibd2015_nkcell.m`]: https://github.com/SUwonglab/rss-net/blob/master/examples/ibd2015_nkcell/summarize_ibd2015_nkcell.m
[RSS-E baseline model]: https://doi.org/10.1038/s41467-018-06805-x
## Overview
Here we describe an end-to-end RSS-NET analysis of
inflammatory bowel disease (IBD) GWAS summary statistics
[(Liu et al, 2015)](https://doi.org/10.1038/ng.3359)
and a gene regulatory network inferred for natural killer (NK) cells
[(ENCODE Project Consortium, 2012)](https://doi.org/10.1038/nature11247).
This example illustrates the actual data analyses performed in [Zhu et al (2020)][].
To reproduce results of this example,
please use scripts in the directory [`ibd2015_nkcell/`][],
and follow the step-by-step guide below.
Before running any script in [`ibd2015_nkcell/`][],
please [install](setup.html) RSS-NET.
Since a real genome-wide analysis is conducted here,
this example is more complicated than the previous [simulation example](wtccc_bcell.html).
It is advisable to go through the previous [simulation example](wtccc_bcell.html)
before diving into the present real-world example.
With the software installed and the input data downloaded,
one should be able to run this example by simply typing the following line in shell:
```{r, eval=FALSE, engine='zsh'}
$ sbatch ibd2015_nkcell.sbatch
```
If a different directory is used to store the software, input data
and/or output data, please modify file paths in the given scripts accordingly.
## Step-by-step illustration
### Download data files
All data files required to run this example are freely available at Zenodo
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3698240.svg)][zenodo-ibd2015nkcell].
Please contact me (`xiangzhu[at]stanford.edu`)
if you have any trouble accessing these files.
After a complete download, you should see the following files.
```{r, eval=FALSE, engine='zsh'}
$ tree .
.
├── ibd2015_gene_grch37.mat
├── ibd2015_nkcell_data.md5
├── ibd2015_nkcell_full_results.zip
├── ibd2015_nkcell_summary_results.zip
├── ibd2015_NK_snp2gene_cis.mat
├── ibd2015_null_seed_459_squarem_step2.mat
├── ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
├── ibd2015_snp2gene.mat
├── ibd2015_sumstat.mat
└── Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat
0 directories, 10 files
```
To help readers confirm if they use the same files as we do,
we report 128-bit MD5 hashes of all files in `ibd2015_nkcell_data.md5`.
To help readers confirm if they can reproduce results of this example,
we also provide the full results (`ibd2015_nkcell_full_results.zip`)
and summarized results (`ibd2015_nkcell_summary_results.zip`)
in the same Zenodo deposit.
For simplicity and generality,
we introduce the following short-hand notations.
```matlab
gwas = 'ibd2015';
net = 'Primary_Natural_Killer_cells_from_peripheral_blood';
cis = 'NK';
```
#### 1. `${gwas}_sumstat.mat`: processed GWAS summary statistics and LD matrix estimates
This file contains processed IBD GWAS summary statistics
and LD matrix estimates for 1.1 million common SNPs.
This file is large (43G) because of the LD matrix .
```{r, eval=FALSE, engine='zsh'}
$ md5sum ibd2015_sumstat.mat
ad1763079ee7e46b21722f74e037a230 ibd2015_sumstat.mat
$ du -sh ibd2015_sumstat.mat
43G ibd2015_sumstat.mat
```
Let's look at the contents of `ibd2015_sumstat.mat`.
```matlab
>> sumstat = matfile('ibd2015_sumstat.mat');
>> sumstat
matlab.io.MatFile
Properties:
Properties.Source: 'ibd2015_sumstat.mat'
Properties.Writable: false
BR: [22x1 cell]
R: [22x1 cell]
SiRiS: [22x1 cell]
betahat: [22x1 cell]
chr: [22x1 cell]
pos: [22x1 cell]
se: [22x1 cell]
```
GWAS summary statistics and LD estimates are stored as
[cell arrays](https://www.mathworks.com/help/matlab/cell-arrays.html).
RSS-NET only uses the following variables:
- `betahat{j,1}`, single-SNP effect size estimates of all SNPs on chromosome `j`;
- `se{j,1}`, standard errors of `betahat{j, 1}`;
- `chr{j,1}` and `pos{j, 1}`, physical positions of these SNPs (GRCh37 build);
- `SiRiS{j,1}`, a [sparse matrix](https://www.mathworks.com/help/matlab/sparse-matrices.html),
defined as `repmat((1./se),1,p) .* R .* repmat((1./se)',p,1)`,
where `R` is the estimated LD matrix of these `p` SNPs.
#### 2. `${gwas}_snp2gene.mat`: physical distance between SNPs and genes
This file contains the physical distance between
each GWAS SNP and each protein-coding gene, within 1 Mb.
This file corresponds to ${\bf G}_j$ in the RSS-NET model.
```{r, eval=FALSE, engine='zsh'}
$ md5sum ibd2015_snp2gene.mat
7832838e2675e4cf3b85f471fed95554 ibd2015_snp2gene.mat
$ du -sh ibd2015_snp2gene.mat
224M ibd2015_snp2gene.mat
```
In this example, there are 18334 genes and 1081481 SNPs.
```matlab
>> snp2gene = matfile('ibd2015_snp2gene.mat');
>> snp2gene
matlab.io.MatFile
Properties:
Properties.Source: 'ibd2015_snp2gene.mat'
Properties.Writable: false
chr: [1081481x1 int32]
colid: [14126805x1 int32]
numgene: [1x1 int32]
numsnp: [1x1 int32]
pos: [1081481x1 int32]
rowid: [14126805x1 int32]
val: [14126805x1 double]
>> [snp2gene.numgene snp2gene.numsnp]
18334 1081481
```
The SNP-to-gene distance information is captured
by a three-column matrix `[colid rowid val]`.
For example, the distance between gene `1` and SNP `6` is `978947` bps.
```matlab
>> colid=snp2gene.colid; rowid=snp2gene.rowid; val=snp2gene.val;
>> [colid(6) rowid(6) val(6)]
1 6 978947
```
#### 3. `${net}_gene2gene.mat`: gene regulatory network
This file contains information of gene-to-gene
connections in a given regulatory network.
```{r, eval=FALSE, engine='zsh'}
$ md5sum Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat
35ac724b86f7777d87116cc48166caa2 Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat
$ du -sh Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat
1.7M Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat
```
```matlab
>> gene2gene = matfile('Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat');
>> gene2gene
matlab.io.MatFile
Properties:
Properties.Source: 'Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat'
Properties.Writable: false
colid: [110733x1 int32]
numgene: [1x1 int32]
rowid: [110733x1 int32]
val: [110733x1 double]
```
For implementation convenience, this file contains the trivial case
where each gene is mapped to itself with `val=1`.
```matlab
>> colid=gene2gene.colid; rowid=gene2gene.rowid; val=gene2gene.val;
>> [gene2gene.numgene sum(colid==rowid) unique(val(colid==rowid))]
18334 18334 1
```
For a given network, transcription factors (TFs) are stored in `rowid`
and target genes (TGs) are stored in `colid`.
In this example there are 3105 TGs and 376 TFs.
Among these TFs and TGs, there are 92399 edges.
The edge weights range from 0.61 to 1.
These TF-to-TG connections and edge weights correspond to
$\{{\bf T}_g,v_{gt}\}$ in the RSS-NET model.
```matlab
>> [length(unique(colid(colid ~= rowid))) length(unique(rowid(colid ~= rowid)))]
3105 376
>> [length(colid(colid ~= rowid)) length(rowid(colid ~= rowid))]
92399 92399
>> val_tftg = val(colid ~= rowid);
>> [min(val_tftg) quantile(val_tftg, 0.25) median(val_tftg) quantile(val_tftg, 0.75) max(val_tftg)]
0.6138 0.6324 0.6568 0.6949 1.0000
```
#### 4. `${gwas}_${net}_snp2net.mat`: SNP-to-network proximity annotation
This file contains binary annotations
whether a SNP is "near" the given network,
that is, within 100 kb of any network element
(TF, TG or associated regulatory elements).
```{r, eval=FALSE, engine='zsh'}
$ md5sum ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
d96cd9b32759f954cc37680dc6aeafd8 ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
$ du -sh ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
21M ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
```
In this example, there are 1081481 GWAS SNPs and
382443 of them are near the NK cell network (i.e. `val=1`).
```matlab
>> snp2net = matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat');
>> snp2net
matlab.io.MatFile
Properties:
Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat'
Properties.Writable: false
chr: [1081481x1 int32]
pos: [1081481x1 int32]
snpid: [1081481x1 int32]
val: [1081481x1 double]
window: [1x1 double]
>> [length(snp2net.val) sum(snp2net.val) snp2net.window]
1081481 382443 100000
>> unique(snp2net.val)'
0 1
```
#### 5. `${gwas}_${cis}_snp2gene_cis.mat`: SNP-to-gene cis regulation
This file contains the SNP-to-gene cis regulation scores
derived from context-matching cis eQTL studies.
This file corresponds to $(c_{jg}-1)$ in the RSS-NET model.
```{r, eval=FALSE, engine='zsh'}
$ md5sum ibd2015_NK_snp2gene_cis.mat
dedad8e25773fad69576dbce0f7d9f93 ibd2015_NK_snp2gene_cis.mat
$ du -sh ibd2015_NK_snp2gene_cis.mat
165M ibd2015_NK_snp2gene_cis.mat
```
```matlab
>> snp2gene_cis = matfile('ibd2015_NK_snp2gene_cis.mat');
>> snp2gene_cis
matlab.io.MatFile
Properties:
Properties.Source: 'ibd2015_NK_snp2gene_cis.mat'
Properties.Writable: false
colid: [10790012x1 int32]
numgene: [1x1 int32]
numsnp: [1x1 int32]
rowid: [10790012x1 int32]
val: [10790012x1 double]
```
In this example, there are 10790012 SNP-gene pairs
with cis regulation scores available,
consisting of 829280 SNPs and 18230 genes.
The cis regulation scores (`val`) range from 0 to 0.76.
The cis regulation scores used in this example are derived
from recently published cis eQTLs in NK cells
([Schmiedel et al, 2018](https://doi.org/10.1016/j.cell.2018.10.022)).
```matlab
>> [snp2gene_cis.numsnp length(unique(snp2gene_cis.rowid)) snp2gene_cis.numgene length(unique(snp2gene_cis.colid))]
1081481 829280 18334 18230
>> val=snp2gene_cis.val;
>> [min(val) quantile(val, 0.25) median(val) quantile(val, 0.75) max(val)]
0 0.0226 0.0636 0.1172 0.7622
```
### Run RSS-NET analysis
To facilitate running RSS-NET on real data,
we provide a generic script [`analysis_template.m`][].
For the present example, the RSS-NET analysis is implemented by
[`ibd2015_nkcell.m`][] and [`ibd2015_nkcell.sbatch`][].
#### 1. Specify analysis-specific variables
We need to specify a few analysis-specific variables
that are required by [`analysis_template.m`][],
a template script that fits RSS-NET to the given data.
For this example, we use [`ibd2015_nkcell.m`][] for the specification.
In brief, the following variables are specified.
- Data names: `gwas_name`, `net_name`, `cis_name`;
- GWAS sample size and number of genes: `nsam`, `ngene`;
- Hyper-parameter grid: `eta_set`, `rho_set`, `theta0_set`, `theta_set`.
In general, if you want to use RSS-NET to analyze a different GWAS
and/or network, simply modify [`ibd2015_nkcell.m`][]
and there is no need to change [`analysis_template.m`][].
#### 2. Submit job arrays
For a given GWAS and a given regulatory network,
all RSS-NET analysis tasks are almost identical
and they only differs in hyper-parameter values.
To exploit this feature, we run one RSS-NET analysis as
a [job array](https://slurm.schedmd.com/job_array.html)
with multiple tasks that run in parallel.
To this end, we write a simple sbatch script
[`ibd2015_nkcell.sbatch`][],
and submit it to a cluster with [`Slurm`](https://slurm.schedmd.com/) available.
```{r, eval=FALSE, engine='zsh'}
$ sbatch ibd2015_nkcell.sbatch
```
After the submission, 125 tasks are created.
As shown below, multiple tasks should run in different nodes simultaneously.
```{r, eval=FALSE, engine='zsh'}
$ squeue -u xiangzhu
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
62554249_107 owners ibd2015_ xiangzhu R 0:32 1 sh02-17n12
62554249_108 owners ibd2015_ xiangzhu R 0:32 1 sh02-17n12
62554249_109 owners ibd2015_ xiangzhu R 0:32 1 sh01-28n08
62554249_110 owners ibd2015_ xiangzhu R 0:32 1 sh01-17n18
62554249_111 owners ibd2015_ xiangzhu R 0:32 1 sh01-26n33
62554249_112 owners ibd2015_ xiangzhu R 0:32 1 sh01-27n30
```
For each task of this job array, we request 1 node with 8 CPUs and 32 Gb total memory
and set the maximum job wall-clock time as 12.5 hours
(see [`ibd2015_nkcell.sbatch`][] for details).
The actual memory utilized per task is 26.58 GB (efficiency: 85.06% of 31.25 GB).
Across all 125 tasks, the actual running time per task ranges
from 5 minutes to 8.9 hours, with median being 2.1 hours.
We request 8 CPUs for each task because RSS-NET takes advantage of
[`parfor`](https://www.mathworks.com/help/parallel-computing/parfor.html) in the
[MATLAB Parallel Computing Toolbox](https://www.mathworks.com/help/distcomp/index.html).
If this toolbox is not available in your environment,
you can still run the same RSS-NET codes on this example (in a serial manner),
with longer computation time per task.
Each task of the job array saves results in a
[Version 7 MAT-file](https://www.mathworks.com/help/matlab/import_export/mat-file-versions.html),
`${gwas}_${net}_${cis}_out_${id}.mat`.
Each MAT-file contains variational estimates for a given set of hyper-parameter values.
For example, the following MAT-file
`ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat`
stores RSS-NET results based on the 66-th set of hyper-parameter values from the grid.
```matlab
>> res = matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat');
>> res
matlab.io.MatFile
Properties:
Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat'
Properties.Writable: false
alpha: [1081481x1 double]
logw: [1x1 double]
mu: [1081481x1 double]
run_time: [1x1 double]
s: [1081481x1 double]
sigb: [1x1 double]
sige: [1x1 double]
theta: [1x1 double]
theta0: [1x1 double]
```
Here `[alpha mu s]` correspond to the optimal variational parameters
$\{\alpha_j^\star,\mu_j^\star,(\tau_j^\star)^2\}$ for the given hyper-parameters,
`logw` corresponds to the variational lower bound $F^\star$ and
`[theta0 theta sigb sige]` corresponds to $\{\theta_0,\theta,\sigma_0,\sigma\}$.
Please see [Supplementary Notes][] of [Zhu et al (2020)][] for definitions.
### Summarize RSS-NET results
The RSS-NET result files `${gwas}_${net}_${cis}_out_${id}.mat`
shown above can be further summarized into
both network-level and gene-level statistics
as reported in [Zhu et al (2020)][].
To facilitate summarizing RSS-NET results,
we provide a generic script [`summary_template.m`][].
For the present example, the RSS-NET summary is
implemented by [`summarize_ibd2015_nkcell.m`][].
If you need to summarize a different RSS-NET analysis,
simply modify [`summarize_ibd2015_nkcell.m`][]
and there is no need to change [`summary_template.m`][].
For this example, simply run the following line in a Matlab console.
```matlab
>> run summarize_ibd2015_nkcell.m;
```
Running [`summarize_ibd2015_nkcell.m`][] yields two (much smaller) MAT-files:
`${gwas}_${net}_${cis}_results_model.mat` that stores network-level enrichment results,
and `${gwas}_${net}_${cis}_results_gene.mat` that stores locus-level association results.
#### 1. `${gwas}_${net}_${cis}_results_model.mat`: network-level enrichments
To assess whether a regulatory network is enriched
for genetic associations with a trait,
we evaluate a Bayes factor (BF) comparing
the baseline model ($M_0:\theta=0~\text{and}~\sigma^2=0$)
in RSS-NET with an enrichment model.
```matlab
>> model_res = matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_modet');
>> model_res
matlab.io.MatFile
Properties:
Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_model.mat'
Properties.Writable: false
log10_bf: [1x1 double]
log10_bf_ns: [1x1 double]
log10_bf_nt: [1x1 double]
log10_bf_ts: [1x1 double]
logw: [125x1 double]
sigb: [125x1 double]
sige: [125x1 double]
theta: [125x1 double]
theta0: [125x1 double]
time: [125x1 double]
```
Here `log10_bf*` are log 10 BFs comparing
the following 4 enrichment models against $M_0$.
- `log10_bf`: $M_1:\theta>0~\text{or}~\sigma^2>0$;
- `log10_bf_ns`: $M_{11}:\theta>0~\text{and}~\sigma^2=0$;
- `log10_bf_nt`: $M_{12}:\theta=0~\text{and}~\sigma^2>0$;
- `log10_bf_ts`: $M_{13}:\theta>0~\text{and}~\sigma^2>0$.
Because $M_1$ is more flexible than other models,
we mainly use `log10_bf` as recommended by [Zhu et al (2020)][].
By running this example, we reproduce the enrichment BFs of
[IBD GWAS and NK cell network](https://xiangzhu.github.io/rss-peca/ibd2015_net.html)
reported in [Zhu et al (2020)][].
The NK cell network shows strong enrichment of IBD genetic associations,
which seems consistent with the role of NK cell in autoimmune diseases like IBD.
```matlab
>> load ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_model.mat
>> [log10_bf log10_bf_ns log10_bf_nt log10_bf_ts]
35.7048 29.4216 15.1461 35.8986
```
Let's perform a more rigorous check of reproducibility.
For the same hyper-parameter values, we compare the resulting
variational lower bounds from my previous run and from the current run.
Differences are numerical negligible.
```matlab
>> res_file = 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_model.mat';
>> old_path = '~/Dropbox/rss/Data/peca_human/job_camp/gwas33_net71/results/model_results/peca_encode/ibd2015/';
>> new_path = '~/GitHub/rss-net/examples/ibd2015_nkcell/results/';
>> old_res = matfile(strcat(old_path,res_file));
>> new_res = matfile(strcat(new_path,res_file));
>> [min(old_res.logw - new_res.logw) median(old_res.logw - new_res.logw) max(old_res.logw - new_res.logw)]
1.0e-10 *
-0.0568 0 0.1592
```
#### 2. `${gwas}_${net}_${cis}_results_gene.mat`: locus-level associations
To summarize association between a locus and a trait,
we compute $P_1^{\sf net}$, the posterior probability that at least
one SNP $j$ in the locus is associated with the trait ($\beta_j\neq 0$):
$$
P_1^{\sf net}=1-\Pr(\beta_j=0,~\forall j\in\text{locus}~|~\text{data},M_1).
$$
As in [Zhu et al (2020)][], a locus is defined as the transcribed region
of a gene plus 100 kb upstream and downstream.
The locus definition is provided in `ibd2015_gene_grch37.mat`.
```matlab
>> gene_res=matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_gene.mat');
>> gene_res
matlab.io.MatFile
Properties:
Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_gene.mat'
Properties.Writable: false
P1_gene: [18334x1 double]
gene_chr: [18334x1 double]
gene_start: [18334x1 double]
gene_stop: [18334x1 double]
```
Here `P1_gene` corresponds to $P_1^{\sf net}$ and `[gene_chr gene_start gene_stop]`
denote physical position of genes based on GRCh37.
Let's perform a reproducibility check for gene-level results.
Again, my previous analysis and the current analysis yield
numerically identical answers.
```matlab
>> res_file = 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_gene.mat';
>> old_path = '~/Dropbox/rss/Data/peca_human/job_camp/gwas33_net71/results/gene_results/peca_encode/ibd2015/';
>> new_path = '~/GitHub/rss-net/examples/ibd2015_nkcell/results/';
>> old_res = matfile(strcat(old_path,res_file));
>> new_res = matfile(strcat(new_path,res_file));
>> [min(old_res.P1_gene - new_res.P1_gene) median(old_res.P1_gene - new_res.P1_gene) max(old_res.P1_gene - new_res.P1_gene)]
1.0e-15 *
-0.2220 0.0278 0.5551
```
## More examples
The RSS-NET analyses of 18 complex traits and 38 gene regulatory networks
reported in [Zhu et al (2020)][] are essentially 684 modified reruns
of the example above (with different input GWAS and/or network data,
and different hyper-parameter grids).
Our full analysis results are publicly available at
<https://xiangzhu.github.io/rss-peca>.
## Appendix
Careful readers may notice an optional input data file
`ibd2015_null_seed_459_squarem_step2.mat` specified in
[`ibd2015_nkcell.m`][] and used in [`analysis_template.m`][].
This file provides the [RSS-E baseline model][] fitting results of IBD GWAS data.
If this file is available, [`analysis_template.m`][] uses
the optimal [RSS-E baseline model][] results to initialize RSS-NET.
If not, RSS-NET uses a random initialization.
For this example, here are the enrichment BFs based on using
the optimal [RSS-E baseline model][] results to initialize RSS-NET.
```matlab
>> [log10_bf log10_bf_ns log10_bf_nt log10_bf_ts]
35.7048 29.4216 15.1461 35.8986
```
Here are the enrichment BFs based on random initialization,
which are consistent with, but smaller than previous results.
```matlab
>> [log10_bf log10_bf_ns log10_bf_nt log10_bf_ts]
32.1495 29.5295 11.0758 32.3432
```
Using optimal [RSS-E baseline model][] results to initialize RSS-NET
is not required, but highly recommended in practice,
because this often yields a better fit as shown above.
Please see [this tutorial](https://stephenslab.github.io/rss/Example-5B)
for more details of fitting [RSS-E baseline model][] on GWAS data.