diff --git a/manuscript_results/motivating_example.ipynb b/manuscript_results/motivating_example.ipynb index 688d3f5..fc5c6af 100644 --- a/manuscript_results/motivating_example.ipynb +++ b/manuscript_results/motivating_example.ipynb @@ -8,7 +8,7 @@ "source": [ "# A demonstration of SuSiE's motivations\n", "\n", - "This document explains with toy example illustration the unique type of inference SuSiE is interested in." + "This document explains with toy example illustration the unique type of inference SuSiE is interested in. For a more complicated where variables are in high but not perfect correlation please see [this notebook](motivating_example_high_corr.html)." ] }, { @@ -849,9 +849,9 @@ "" ] ], - "version": "0.17.2" + "version": "0.21.0" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/manuscript_results/motivating_example_high_corr.ipynb b/manuscript_results/motivating_example_high_corr.ipynb new file mode 100644 index 0000000..e8b13dc --- /dev/null +++ b/manuscript_results/motivating_example_high_corr.ipynb @@ -0,0 +1,257 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Experiment with variables of given high correlation structure\n", + "\n", + "This notebook is meant to address to a shared concern from two referees. The [motivating example](motivating_example.html) in the manuscript was designed to be a simple toy for illustrating the novel type of inference SuSiE offers. Here are some slightly more complicated examples, based on the motivating example, but with variables in high (rather than perfect) correlations with each other." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## $x_1$ and $x_2$ are highly correlated" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Following a reviewer's\n", + "suggestion, we simulated two variables, $x_1$ and $x_2$, with high but\n", + "not perfect correlation ($0.9$). Specifically, we simulated $n = 600$\n", + "samples stored as an $X_{600 \\times 2}$ matrix, in which each row was\n", + "drawn *i.i.d.* from a normal distribution with mean zero and\n", + "$\\mathrm{cor}(x_1, x_2) = 0.9$. We then simulated $y_i = x_{i1} \\beta_1\n", + "+ x_{i2} \\beta_2 + \\varepsilon_i$, with $\\beta_1 = 1, \\beta_2 = 0$,\n", + "and $\\varepsilon_i$ *i.i.d.* normal with zero mean and standard\n", + "deviation of 3. We performed 1,000 replicates of this simulation\n", + "(generated with different random number seeds)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this simulation, the correlation between $x_1$ and $x_2$ is still\n", + "sufficiently high (0.9) to make distinguishing between the two\n", + "variables somewhat possible, but not non entirely straightforward. For\n", + "example, when we run lasso (using `cv.glmnet` from the `glmnet`\n", + "R package) on these data it wrongly selected $x_2$ as having\n", + "non-zero coefficient in about 10\\% of the simulations (95 out of\n", + "1,000), and correctly selected $x_1$ in about 96\\% of simulations (956\n", + "out of 1,000). Note that the lasso does not assess uncertainty in\n", + "variable selection, so these results are not directly comparable\n", + "with SuSiE CSs below; however, the lasso results demonstrate that\n", + "distinguishing the correct variable here is possible, but not so easy\n", + "that the example is uninteresting." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ideally, then, SuSiE should identify variable $x_1$ as an effect\n", + "variable and drop $x_2$ as often as possible. However, due to the high\n", + "correlation between the variables, it is inevitable that some\n", + "95\\% SuSiE credible sets (CS) will also contain $x_2$. Most\n", + "important is that we should avoid, as much as possible, reporting a CS\n", + "that contains *only* $x_2$, since the goal is that 95\\% of CSs\n", + "should contain at least one effect variable. The SuSiE results (SuSiE version 0.9.1 on R 3.5.2) \n", + "are summarized below. The code used for the simulation [can be found here](https://github.com/stephenslab/susie-paper/blob/master/src/ref_3_question.R)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "| CSs | count |\n", + "| :---- | ----: |\n", + "| (1) | 829 |\n", + "| (1,2) | 169 |\n", + "| **(2)** | 2 |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Highlighted in **bold** are CSs that do *not* contain\n", + "the true effect variable --- there are 2 of them out of 1,000 CSs\n", + "detected. In summary, SuSiE precisely identifies the effect\n", + "variable in a single CS in the majority (83\\%) of the simulations, and\n", + "provides a \"valid\" CS (*i.e.*, one containing an effect\n", + "variable) in almost all simulations (998 out of 1,000). Further, even\n", + "when SuSiE reports a CS including both variables, it consistently\n", + "assigns higher posterior inclusion probability (PIP) to the correct\n", + "variable, $x_1$: among the 169 CSs that contain both variables, the\n", + "median PIPs for $x_1$ and $x_2$ were 0.86 and 0.14, respectively." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## When an additional non-effect variable highly correlated with both variable groups" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another referee suggested the following:\n", + "\n", + "> Suppose\n", + "we have another predictor $x_5$, which is both correlated with $(x_1,\n", + "x_2)$ and $(x_3, x_4)$. Say $\\mathrm{cor}(x_1, x_5) = 0.9$,\n", + "$\\mathrm{cor}(x_2, x_5) = 0.7$, and $\\mathrm{cor}(x_5, x_3)\n", + "= \\mathrm{cor}(x_5, x_4) = 0.8$. Does the current method assign $x_5$\n", + "to the $(x_1, x_2)$ group or the $(x_3, x_4)$ group?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Following the suggestion, we simulated $x_1, \\ldots, x_5$ from a\n", + "multivariate normal with zero mean and the covariance matrix\n", + "approximately as given in Table below. (Since this matrix is\n", + "not quite positive definite, in our R code we used `nearPD` from\n", + "the `Matrix` package to generate the nearest positive definite\n", + "matrix --- the entries of the resulting covariance matrix differ only\n", + "very slightly from those in Table below, with a maximum\n", + "absolute difference of 0.0025 between corresponding elements in the\n", + "two matrices).\n", + "\n", + "| | $x_1$ | $x_2$ | $x_3$ | $x_4$ | $x_5$ |\n", + "| ------: | ------: | ------: | ------: | ------: | ------: |\n", + "| $x_1$ | 1.00 | 0.92 | 0.70 | 0.70 | 0.90 |\n", + "| $x_2$ | 0.92 | 1.00 | 0.70 | 0.70 | 0.70 |\n", + "| $x_3$ | 0.70 | 0.70 | 1.00 | 0.92 | 0.80 |\n", + "| $x_4$ | 0.70 | 0.70 | 0.92 | 1.00 | 0.80 |\n", + "| $x_5$ | 0.90 | 0.70 | 0.80 | 0.80 | 1.00 |\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We simulated $n = 600$ samples from this\n", + "multivariate normal distribution, then we simulated $n = 600$\n", + "responses $y_i$ from the regression model $y_i = x_{i1} \\beta_1 +\n", + "\\cdots x_{i5} \\beta_5 + \\varepsilon_i$, with $\\beta = (0, 1, 1, 0,\n", + "0)^T$, and $\\varepsilon_i$ *i.i.d.* normal with zero mean and\n", + "standard deviation of 3. We repeated this simulation procedure 1,000\n", + "times with different random seeds, and each time we fit a SuSiE\n", + "model to the simulated data by running the IBSS algorithm.\\footnote{To\n", + "simplify the example, we ran the IBSS algorithm with $L = 2$, and\n", + "fixed the $\\sigma_0^2 = 1$. Similar results were obtained when we used\n", + "larger values of $L$, and when $\\sigma_0^2$ was estimated. For more\n", + "details on how the data were simulated and how the SuSiE models\n", + "were fitted to the data sets, [see this script](https://github.com/stephenslab/susie-paper/blob/master/src/ref_4_question.R)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Like the toy motivating example given in the paper, in this simulation\n", + "the first two predictors are strongly correlated with each other, so\n", + "it may be difficult to distinguish among them, and likewise for the\n", + "third and fourth predictors. The fifth predictor, which has no effect\n", + "on $y$, potentially complicates matters because it is also strongly\n", + "correlated with the other predictors. Despite this complication, our\n", + "basic goal remains the same: the Credible Sets inferred by SuSiE\n", + "should capture the true effects most of the time, while also\n", + "minimizing \"false positive\" CSs that do not contain any true\n", + "effects. (Further, each CS should, ideally, be as small as possible.)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Table below summarizes the results of these simulations:\n", + "the left-hand column gives a unique result (a combination of CSs), and\n", + "the right-hand column gives the number of times this unique result\n", + "occurred among the 1,000 simulations. The CS combinations are ordered\n", + "by the frequency of their occurrence in the simulations. We highlight\n", + "in **bold** CSs that do not contain a true effect.\n", + "\n", + "| CSs | count |\n", + "| :------------- | ----: |\n", + "| (2), (3) | 551 |\n", + "| (2), (3,4) | 212 |\n", + "| (1,2), (3) | 176 |\n", + "| (1,2), (3,4) | 38 |\n", + "| (2), (3,4,5) | 9 |\n", + "| **(1)**, (3,4) | 3 |\n", + "| (2), **(4)** | 3 |\n", + "| (1,2), (3,4,5) | 2 |\n", + "| **(1)**, (3) | 1 |\n", + "| (1,2), **(4)** | 1 |\n", + "| (2), (3,5) | 1 |\n", + "| (3), (1,2,5) | 1 |\n", + "| (3), (1,2,3) | 1 |\n", + "| (3,4), (1,2,4) | 1 |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the majority (551) of the simulations, SuSiE precisely identiied\n", + "the true effect variables, and no others. In most other cases,\n", + "SuSiE identified two CSs, each containing a correct effect variable, and\n", + "with one or more other variables included due to high correlation with\n", + "the true-effect variable. The referee asks specifically about how the\n", + "additional variable $x_5$ behaves in this example. In practice, $x_5$\n", + "was rarely included in a CS. In the few cases where $x_5$ *was*\n", + "included in a CS, the results were consistent with the simulation\n", + "setting; $x_5$ was included more frequently with $x_3$ and/or $x_4$\n", + "(12 times) rather than $x_2$ and/or $x_1$ (only once). In no\n", + "simulations did SuSiE form a large group that contains all five\n", + "predictors." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This example actually highlights the benefits of SuSiE compared to\n", + "alternative approaches (e.g., hierinf) that *first* cluster the\n", + "variables into groups based on the correlation structure, then test\n", + "the groups. As we pointed out in the manuscript, this alternative\n", + "approach (first cluster variables into groups, then test groups) would\n", + "work well in the toy example in the paper, but in general it requires\n", + "*ad hoc* decisions about how to cluster variables. In this more\n", + "complex example raised by the referee, it is far from clear how to\n", + "cluster the variables. \\SuSiE avoids this problem because there is\n", + "no pre-clustering of variables; instead, the SuSiE CSs are computed\n", + "directly from an (approximate) posterior distribution (which takes\n", + "into account how the variables $x$ are correlated with each other, as\n", + "well as their relationship with $y$)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "R", + "language": "R", + "name": "ir" + }, + "language_info": { + "codemirror_mode": "r", + "file_extension": ".r", + "mimetype": "text/x-r-source", + "name": "R", + "pygments_lexer": "r", + "version": "3.5.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/manuscript_results/ref4_question.ipynb b/manuscript_results/ref4_question.ipynb deleted file mode 100644 index a1065e0..0000000 --- a/manuscript_results/ref4_question.ipynb +++ /dev/null @@ -1,1263 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Experiment with variables of given high correlation structure\n", - "\n", - "This notebook is meant to address to a concern from a referee." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It says the following:\n", - "\n", - "> Suppose\n", - "we have another predictor $x_5$, which is both correlated with $(x_1,\n", - "x_2)$ and $(x_3, x_4)$. Say $\\mathrm{cor}(x_1, x_5) = 0.9$,\n", - "$\\mathrm{cor}(x_2, x_5) = 0.7$, and $\\mathrm{cor}(x_5, x_3)\n", - "= \\mathrm{cor}(x_5, x_4) = 0.8$. Does the current method assign $x_5$\n", - "to the $(x_1, x_2)$ group or the $(x_3, x_4)$ group?\n", - "\n", - "Here we investigate the problem using simulations for given correlation structure over multiple replicates." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Correlation structure between variables" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First we determine covariance matrix that satisfies correlation structure as outlined above, " - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "h = 0.92\n", - "l = 0.7\n", - "cormat = rbind(c(1.0, h, l, l, 0.9),\n", - " c(h, 1.0, l, l, 0.7),\n", - " c(l, l, 1.0, h, 0.8),\n", - " c(l, l, h, 1.0, 0.8),\n", - " c(0.9, 0.7, 0.8, 0.8, 1.0))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "where we assume high correlation between $x_1$ and $x_2$, and $x_3$ and $x_4$ ($R_{12} = R_{34} = 0.92$). \n", - "We set $R_{13} = R_{14} = R_{23} = R_{24} = 0.7$ and find the nearest \n", - "positive definite matrix for this correlation structure:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "covmat = as.matrix(Matrix::nearPD(cormat)$mat)" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "
A matrix: 5 × 5 of type dbl[,5]
1.00000000.91665510.69951850.69951850.8965043
0.91665511.00000000.69930320.69930320.7003526
0.69951850.69930321.00000000.92000590.7991611
0.69951850.69930320.92000591.00000000.7991611
0.89650430.70035260.79916110.79916111.0000000
\n" - ], - "text/latex": [ - "A matrix: 5 × 5 of type dbl{[},5{]}\n", - "\\begin{tabular}{lllll}\n", - "\t 1.0000000 & 0.9166551 & 0.6995185 & 0.6995185 & 0.8965043\\\\\n", - "\t 0.9166551 & 1.0000000 & 0.6993032 & 0.6993032 & 0.7003526\\\\\n", - "\t 0.6995185 & 0.6993032 & 1.0000000 & 0.9200059 & 0.7991611\\\\\n", - "\t 0.6995185 & 0.6993032 & 0.9200059 & 1.0000000 & 0.7991611\\\\\n", - "\t 0.8965043 & 0.7003526 & 0.7991611 & 0.7991611 & 1.0000000\\\\\n", - "\\end{tabular}\n" - ], - "text/markdown": [ - "\n", - "A matrix: 5 × 5 of type dbl[,5]\n", - "\n", - "| 1.0000000 | 0.9166551 | 0.6995185 | 0.6995185 | 0.8965043 |\n", - "| 0.9166551 | 1.0000000 | 0.6993032 | 0.6993032 | 0.7003526 |\n", - "| 0.6995185 | 0.6993032 | 1.0000000 | 0.9200059 | 0.7991611 |\n", - "| 0.6995185 | 0.6993032 | 0.9200059 | 1.0000000 | 0.7991611 |\n", - "| 0.8965043 | 0.7003526 | 0.7991611 | 0.7991611 | 1.0000000 |\n", - "\n" - ], - "text/plain": [ - " [,1] [,2] [,3] [,4] [,5] \n", - "[1,] 1.0000000 0.9166551 0.6995185 0.6995185 0.8965043\n", - "[2,] 0.9166551 1.0000000 0.6993032 0.6993032 0.7003526\n", - "[3,] 0.6995185 0.6993032 1.0000000 0.9200059 0.7991611\n", - "[4,] 0.6995185 0.6993032 0.9200059 1.0000000 0.7991611\n", - "[5,] 0.8965043 0.7003526 0.7991611 0.7991611 1.0000000" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "cov2cor(covmat)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It still satisfy the structure as outlined at the beginning of the document. We use the nearest PD found as the covariance matrix for simulation below." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Simulation of features and response variables" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We simulate an $X$ matrix of $N=600$ samples and $P_1=5$ variables, \n", - "$$X_{p} \\sim MVN(0, \\Sigma)$$\n", - "\n", - "where $\\Sigma$ is covariance matrix as defined above." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We then expand $X$ to having a total of $P=2000$ variables where the other 1995 variables are independent --- they come from multivariate normal $MVN(0,I)$ with $I$ being the identity matrix. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We simulate effect size $b$ a length $p$ vector with zero elsewhere except:\n", - "\n", - "- Scenario 1: $x_2$ and $x_3$ are effect variables and $x_5$ is not effect variable.\n", - "- Scenario 2: $x_2$, $x_3$ and $x_5$ are all effect variables.\n", - "\n", - "Non-zero effects are drawn from $N(0,1)$. Response $y$ is then generated using a linear model $y = Xb + e$, $e \\sim N(0, \\sigma I_n)$ with $\\sigma$ chosen such that $X$ explains 20% variation in $y$.\n", - "\n", - "The sample size, number of variables and effect size mimic a relatively small sample genetic association fine-mapping application." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Analysis plan" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We run 500 replicates for the above 2 scenarios. We focus our evaluation on $x_5$ and ask:\n", - "\n", - "1. When $x_5$ is not a effect variable, how often is it dropped from the results, or grouped with $(x_1, x_2)$, or with $(x_3,x_4)$.\n", - "2. When $x_5$ is an effect variable, how often is it considered as a set on its own, or grouped with $(x_1, x_2)$, or with $(x_3,x_4)$." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We also assess impact on convergence issues by running SuSiE with initializations to true parameter values and see how different the result can be compared to not initialze it this way." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The code for the experiments can be found at: https://github.com/stephenslab/susie-paper/tree/master/ref_question_dsc" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Results" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Calling: dsc-query ref4_question -o /tmp/jobs/65254510/RtmpsTwJwm/file886d1d05d894.csv --target \"simulate.b5 analyze.cs analyze.elbo analyze\" --force \n", - "Loaded dscquery output table with 2000 rows and 6 columns.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "dscquery is returning a list because one or more outputs are complex; consider converting the list to a tibble using the \"tibble\" package\n" - ] - } - ], - "source": [ - "setwd('../ref_question_dsc/')\n", - "res = dscrutils::dscquery('ref4_question',targets = c('simulate.b5','analyze.cs','analyze.elbo','analyze'), module.output.files='analyze')\n", - "saveRDS(res, 'ref4_question_20200123.rds')" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "
DSCsimulate.b5analyze.csanalyze.elboanalyzeanalyze.output.file
1 0 1.0000000, 2.0000000, 0.9081847, 0.9540923, 0.9540923, 1.0000000, 0.9500000-1591.2711 susie susie/simulate_1_susie_1
1 1 2.00, 1.00, 1.00, 1.00, 1.00, 0.95-1543.7433 susie susie/simulate_2_susie_1
2 0 3.00, 1.00, 1.00, 1.00, 1.00, 0.95-1291.7338 susie susie/simulate_3_susie_1
2 1 0.95 -1005.7306 susie susie/simulate_4_susie_1
3 0 2.00, 1.00, 1.00, 1.00, 1.00, 0.95 -892.7243 susie susie/simulate_5_susie_1
3 1 5.00, 1.00, 1.00, 1.00, 1.00, 0.95-1579.8866 susie susie/simulate_6_susie_1
\n" - ], - "text/latex": [ - "\\begin{tabular}{r|llllll}\n", - " DSC & simulate.b5 & analyze.cs & analyze.elbo & analyze & analyze.output.file\\\\\n", - "\\hline\n", - "\t 1 & 0 & 1.0000000, 2.0000000, 0.9081847, 0.9540923, 0.9540923, 1.0000000, 0.9500000 & -1591.2711 & susie & susie/simulate\\_1\\_susie\\_1 \\\\\n", - "\t 1 & 1 & 2.00, 1.00, 1.00, 1.00, 1.00, 0.95 & -1543.7433 & susie & susie/simulate\\_2\\_susie\\_1 \\\\\n", - "\t 2 & 0 & 3.00, 1.00, 1.00, 1.00, 1.00, 0.95 & -1291.7338 & susie & susie/simulate\\_3\\_susie\\_1 \\\\\n", - "\t 2 & 1 & 0.95 & -1005.7306 & susie & susie/simulate\\_4\\_susie\\_1\\\\\n", - "\t 3 & 0 & 2.00, 1.00, 1.00, 1.00, 1.00, 0.95 & -892.7243 & susie & susie/simulate\\_5\\_susie\\_1 \\\\\n", - "\t 3 & 1 & 5.00, 1.00, 1.00, 1.00, 1.00, 0.95 & -1579.8866 & susie & susie/simulate\\_6\\_susie\\_1 \\\\\n", - "\\end{tabular}\n" - ], - "text/markdown": [ - "\n", - "DSC | simulate.b5 | analyze.cs | analyze.elbo | analyze | analyze.output.file | \n", - "|---|---|---|---|---|---|\n", - "| 1 | 0 | 1.0000000, 2.0000000, 0.9081847, 0.9540923, 0.9540923, 1.0000000, 0.9500000 | -1591.2711 | susie | susie/simulate_1_susie_1 | \n", - "| 1 | 1 | 2.00, 1.00, 1.00, 1.00, 1.00, 0.95 | -1543.7433 | susie | susie/simulate_2_susie_1 | \n", - "| 2 | 0 | 3.00, 1.00, 1.00, 1.00, 1.00, 0.95 | -1291.7338 | susie | susie/simulate_3_susie_1 | \n", - "| 2 | 1 | 0.95 | -1005.7306 | susie | susie/simulate_4_susie_1 | \n", - "| 3 | 0 | 2.00, 1.00, 1.00, 1.00, 1.00, 0.95 | -892.7243 | susie | susie/simulate_5_susie_1 | \n", - "| 3 | 1 | 5.00, 1.00, 1.00, 1.00, 1.00, 0.95 | -1579.8866 | susie | susie/simulate_6_susie_1 | \n", - "\n", - "\n" - ], - "text/plain": [ - " DSC simulate.b5\n", - "1 1 0 \n", - "2 1 1 \n", - "3 2 0 \n", - "4 2 1 \n", - "5 3 0 \n", - "6 3 1 \n", - " analyze.cs \n", - "1 1.0000000, 2.0000000, 0.9081847, 0.9540923, 0.9540923, 1.0000000, 0.9500000\n", - "2 2.00, 1.00, 1.00, 1.00, 1.00, 0.95 \n", - "3 3.00, 1.00, 1.00, 1.00, 1.00, 0.95 \n", - "4 0.95 \n", - "5 2.00, 1.00, 1.00, 1.00, 1.00, 0.95 \n", - "6 5.00, 1.00, 1.00, 1.00, 1.00, 0.95 \n", - " analyze.elbo analyze analyze.output.file \n", - "1 -1591.2711 susie susie/simulate_1_susie_1\n", - "2 -1543.7433 susie susie/simulate_2_susie_1\n", - "3 -1291.7338 susie susie/simulate_3_susie_1\n", - "4 -1005.7306 susie susie/simulate_4_susie_1\n", - "5 -892.7243 susie susie/simulate_5_susie_1\n", - "6 -1579.8866 susie susie/simulate_6_susie_1" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "res = tibble::as_tibble(readRDS('ref4_question_20200123.rds'))\n", - "head(res)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### When $x_5$ is not effect variable" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "out1 = res[which(res$simulate.b5==0),]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For default initialization the average ELBO is:" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "-1257.23074916148" - ], - "text/latex": [ - "-1257.23074916148" - ], - "text/markdown": [ - "-1257.23074916148" - ], - "text/plain": [ - "[1] -1257.231" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "mean(out1[which(out1$analyze=='susie'),]$analyze.elbo)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And for good initialization:" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "-1256.16293512861" - ], - "text/latex": [ - "-1256.16293512861" - ], - "text/markdown": [ - "-1256.16293512861" - ], - "text/plain": [ - "[1] -1256.163" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "mean(out1[which(out1$analyze=='susie_true_init'),]$analyze.elbo)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The ELBO improves slightly on average. For default initialization:" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "out2 = out1[which(out1$analyze=='susie'),]$analyze.cs" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [], - "source": [ - "out <- sapply(out2,function (x) paste(as.character(x$cs),collapse = \" and \"))\n", - "out <- table(factor(out))\n", - "out <- sort(out,decreasing = TRUE)\n", - "out <- as.data.frame(out)\n", - "names(out) <- c(\"CSs\",\"count\")" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "
CSscount
2 178
3 169
2 and 3 44
3 and 2 35
2:3 19
3:4 17
1:2 11
5
2:4 5
3 and 1:2 5
2 and 3:4 3
4 3
1:3 2
1 1
2 and 1479 1
c(1, 3, 4, 5) 1
c(1, 3) 1
\n" - ], - "text/latex": [ - "\\begin{tabular}{r|ll}\n", - " CSs & count\\\\\n", - "\\hline\n", - "\t 2 & 178 \\\\\n", - "\t 3 & 169 \\\\\n", - "\t 2 and 3 & 44 \\\\\n", - "\t 3 and 2 & 35 \\\\\n", - "\t 2:3 & 19 \\\\\n", - "\t 3:4 & 17 \\\\\n", - "\t 1:2 & 11 \\\\\n", - "\t & 5 \\\\\n", - "\t 2:4 & 5 \\\\\n", - "\t 3 and 1:2 & 5 \\\\\n", - "\t 2 and 3:4 & 3 \\\\\n", - "\t 4 & 3 \\\\\n", - "\t 1:3 & 2 \\\\\n", - "\t 1 & 1 \\\\\n", - "\t 2 and 1479 & 1 \\\\\n", - "\t c(1, 3, 4, 5) & 1 \\\\\n", - "\t c(1, 3) & 1 \\\\\n", - "\\end{tabular}\n" - ], - "text/markdown": [ - "\n", - "CSs | count | \n", - "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", - "| 2 | 178 | \n", - "| 3 | 169 | \n", - "| 2 and 3 | 44 | \n", - "| 3 and 2 | 35 | \n", - "| 2:3 | 19 | \n", - "| 3:4 | 17 | \n", - "| 1:2 | 11 | \n", - "| | 5 | \n", - "| 2:4 | 5 | \n", - "| 3 and 1:2 | 5 | \n", - "| 2 and 3:4 | 3 | \n", - "| 4 | 3 | \n", - "| 1:3 | 2 | \n", - "| 1 | 1 | \n", - "| 2 and 1479 | 1 | \n", - "| c(1, 3, 4, 5) | 1 | \n", - "| c(1, 3) | 1 | \n", - "\n", - "\n" - ], - "text/plain": [ - " CSs count\n", - "1 2 178 \n", - "2 3 169 \n", - "3 2 and 3 44 \n", - "4 3 and 2 35 \n", - "5 2:3 19 \n", - "6 3:4 17 \n", - "7 1:2 11 \n", - "8 5 \n", - "9 2:4 5 \n", - "10 3 and 1:2 5 \n", - "11 2 and 3:4 3 \n", - "12 4 3 \n", - "13 1:3 2 \n", - "14 1 1 \n", - "15 2 and 1479 1 \n", - "16 c(1, 3, 4, 5) 1 \n", - "17 c(1, 3) 1 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "out" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here, variable 5 is dropped for all but one replicate, which identified one CS containing variables 1, 3, 4 and 5 (recall only $b_3$ is non-zero).\n", - "\n", - "There are 5 false positive CS, and 5 replicates where no 95% CS is identified." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For \"good\" initialization," - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "out3 = out1[which(out1$analyze=='susie_true_init'),]$analyze.cs\n", - "out <- sapply(out3,function (x) paste(as.character(x$cs),collapse = \" and \"))\n", - "out <- table(factor(out))\n", - "out <- sort(out,decreasing = TRUE)\n", - "out <- as.data.frame(out)\n", - "names(out) <- c(\"CSs\",\"count\")" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "
CSscount
2 and 3 139
2 136
3 124
3 and 1:2 29
2 and 3:4 26
3:4 14
3:4 and 1:2 8
1:2 7
2 and 3:5 4
1:2 and 3:4 3
2:3 3
4 2
1 1
1:3 1
2 and 1479 1
3 and c(1, 5) 1
3:4 and c(1, 2, 5) 1
\n" - ], - "text/latex": [ - "\\begin{tabular}{r|ll}\n", - " CSs & count\\\\\n", - "\\hline\n", - "\t 2 and 3 & 139 \\\\\n", - "\t 2 & 136 \\\\\n", - "\t 3 & 124 \\\\\n", - "\t 3 and 1:2 & 29 \\\\\n", - "\t 2 and 3:4 & 26 \\\\\n", - "\t 3:4 & 14 \\\\\n", - "\t 3:4 and 1:2 & 8 \\\\\n", - "\t 1:2 & 7 \\\\\n", - "\t 2 and 3:5 & 4 \\\\\n", - "\t 1:2 and 3:4 & 3 \\\\\n", - "\t 2:3 & 3 \\\\\n", - "\t 4 & 2 \\\\\n", - "\t 1 & 1 \\\\\n", - "\t 1:3 & 1 \\\\\n", - "\t 2 and 1479 & 1 \\\\\n", - "\t 3 and c(1, 5) & 1 \\\\\n", - "\t 3:4 and c(1, 2, 5) & 1 \\\\\n", - "\\end{tabular}\n" - ], - "text/markdown": [ - "\n", - "CSs | count | \n", - "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", - "| 2 and 3 | 139 | \n", - "| 2 | 136 | \n", - "| 3 | 124 | \n", - "| 3 and 1:2 | 29 | \n", - "| 2 and 3:4 | 26 | \n", - "| 3:4 | 14 | \n", - "| 3:4 and 1:2 | 8 | \n", - "| 1:2 | 7 | \n", - "| 2 and 3:5 | 4 | \n", - "| 1:2 and 3:4 | 3 | \n", - "| 2:3 | 3 | \n", - "| 4 | 2 | \n", - "| 1 | 1 | \n", - "| 1:3 | 1 | \n", - "| 2 and 1479 | 1 | \n", - "| 3 and c(1, 5) | 1 | \n", - "| 3:4 and c(1, 2, 5) | 1 | \n", - "\n", - "\n" - ], - "text/plain": [ - " CSs count\n", - "1 2 and 3 139 \n", - "2 2 136 \n", - "3 3 124 \n", - "4 3 and 1:2 29 \n", - "5 2 and 3:4 26 \n", - "6 3:4 14 \n", - "7 3:4 and 1:2 8 \n", - "8 1:2 7 \n", - "9 2 and 3:5 4 \n", - "10 1:2 and 3:4 3 \n", - "11 2:3 3 \n", - "12 4 2 \n", - "13 1 1 \n", - "14 1:3 1 \n", - "15 2 and 1479 1 \n", - "16 3 and c(1, 5) 1 \n", - "17 3:4 and c(1, 2, 5) 1 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "out" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are 4 false positive CS, and in all replicates at least one 95% CS is identified. Variable 5 are mostly dropped out, but it occasionally groups with either the first ($x_1$,$x_2$) or the second ($x_3$,$x_4$) set of variables." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### When $x_5$ is effect variable" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [], - "source": [ - "out1 = res[which(res$simulate.b5!=0),]" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "-1433.659595861" - ], - "text/latex": [ - "-1433.659595861" - ], - "text/markdown": [ - "-1433.659595861" - ], - "text/plain": [ - "[1] -1433.66" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "mean(out1[which(out1$analyze=='susie'),]$analyze.elbo)" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "-1433.41607211166" - ], - "text/latex": [ - "-1433.41607211166" - ], - "text/markdown": [ - "-1433.41607211166" - ], - "text/plain": [ - "[1] -1433.416" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "mean(out1[which(out1$analyze=='susie_true_init'),]$analyze.elbo)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "ELBO are not very different when \"good\" initialization was used. For default initialization:" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [], - "source": [ - "out2 = out1[which(out1$analyze=='susie'),]$analyze.cs\n", - "out <- sapply(out2,function (x) paste(as.character(x$cs),collapse = \" and \"))\n", - "out <- table(factor(out))\n", - "out <- sort(out,decreasing = TRUE)\n", - "out <- as.data.frame(out)\n", - "names(out) <- c(\"CSs\",\"count\")" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "
CSscount
5 88
3 86
2 82
1 46
c(1, 5) 20
1:2 19
5 and 2 19
c(3, 5) 15
2 and 5 14
3:4 13
12
2:3 9
3 and 2 8
5 and 3 8
2 and 3 7
3 and 1 5
3:5 4
1 and 3 3
2:4 3
3 and 5 3
3 and 5 and 2 3
5 and 2 and 3 3
c(1, 3, 5) 3
1 and 3 and 2 2
2 and 5 and 3 2
2 and c(3, 5) 2
3 and 1:2 2
3 and 2 and 5 2
c(1, 2, 4) 2
1 and 3:4 1
1:3 1
1479 and 1:2 1
2 and 1 1
2 and 3 and 1 1
2 and 5 and 3:4 1
4 1
4:5 1
5 and 1:2 1
5 and 2 and 3:4 1
5 and 2:3 1
5 and 3 and 2 1
5 and 3:4 1
c(1, 2, 3, 5) 1
c(2, 4) 1
\n" - ], - "text/latex": [ - "\\begin{tabular}{r|ll}\n", - " CSs & count\\\\\n", - "\\hline\n", - "\t 5 & 88 \\\\\n", - "\t 3 & 86 \\\\\n", - "\t 2 & 82 \\\\\n", - "\t 1 & 46 \\\\\n", - "\t c(1, 5) & 20 \\\\\n", - "\t 1:2 & 19 \\\\\n", - "\t 5 and 2 & 19 \\\\\n", - "\t c(3, 5) & 15 \\\\\n", - "\t 2 and 5 & 14 \\\\\n", - "\t 3:4 & 13 \\\\\n", - "\t & 12 \\\\\n", - "\t 2:3 & 9 \\\\\n", - "\t 3 and 2 & 8 \\\\\n", - "\t 5 and 3 & 8 \\\\\n", - "\t 2 and 3 & 7 \\\\\n", - "\t 3 and 1 & 5 \\\\\n", - "\t 3:5 & 4 \\\\\n", - "\t 1 and 3 & 3 \\\\\n", - "\t 2:4 & 3 \\\\\n", - "\t 3 and 5 & 3 \\\\\n", - "\t 3 and 5 and 2 & 3 \\\\\n", - "\t 5 and 2 and 3 & 3 \\\\\n", - "\t c(1, 3, 5) & 3 \\\\\n", - "\t 1 and 3 and 2 & 2 \\\\\n", - "\t 2 and 5 and 3 & 2 \\\\\n", - "\t 2 and c(3, 5) & 2 \\\\\n", - "\t 3 and 1:2 & 2 \\\\\n", - "\t 3 and 2 and 5 & 2 \\\\\n", - "\t c(1, 2, 4) & 2 \\\\\n", - "\t 1 and 3:4 & 1 \\\\\n", - "\t 1:3 & 1 \\\\\n", - "\t 1479 and 1:2 & 1 \\\\\n", - "\t 2 and 1 & 1 \\\\\n", - "\t 2 and 3 and 1 & 1 \\\\\n", - "\t 2 and 5 and 3:4 & 1 \\\\\n", - "\t 4 & 1 \\\\\n", - "\t 4:5 & 1 \\\\\n", - "\t 5 and 1:2 & 1 \\\\\n", - "\t 5 and 2 and 3:4 & 1 \\\\\n", - "\t 5 and 2:3 & 1 \\\\\n", - "\t 5 and 3 and 2 & 1 \\\\\n", - "\t 5 and 3:4 & 1 \\\\\n", - "\t c(1, 2, 3, 5) & 1 \\\\\n", - "\t c(2, 4) & 1 \\\\\n", - "\\end{tabular}\n" - ], - "text/markdown": [ - "\n", - "CSs | count | \n", - "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", - "| 5 | 88 | \n", - "| 3 | 86 | \n", - "| 2 | 82 | \n", - "| 1 | 46 | \n", - "| c(1, 5) | 20 | \n", - "| 1:2 | 19 | \n", - "| 5 and 2 | 19 | \n", - "| c(3, 5) | 15 | \n", - "| 2 and 5 | 14 | \n", - "| 3:4 | 13 | \n", - "| | 12 | \n", - "| 2:3 | 9 | \n", - "| 3 and 2 | 8 | \n", - "| 5 and 3 | 8 | \n", - "| 2 and 3 | 7 | \n", - "| 3 and 1 | 5 | \n", - "| 3:5 | 4 | \n", - "| 1 and 3 | 3 | \n", - "| 2:4 | 3 | \n", - "| 3 and 5 | 3 | \n", - "| 3 and 5 and 2 | 3 | \n", - "| 5 and 2 and 3 | 3 | \n", - "| c(1, 3, 5) | 3 | \n", - "| 1 and 3 and 2 | 2 | \n", - "| 2 and 5 and 3 | 2 | \n", - "| 2 and c(3, 5) | 2 | \n", - "| 3 and 1:2 | 2 | \n", - "| 3 and 2 and 5 | 2 | \n", - "| c(1, 2, 4) | 2 | \n", - "| 1 and 3:4 | 1 | \n", - "| 1:3 | 1 | \n", - "| 1479 and 1:2 | 1 | \n", - "| 2 and 1 | 1 | \n", - "| 2 and 3 and 1 | 1 | \n", - "| 2 and 5 and 3:4 | 1 | \n", - "| 4 | 1 | \n", - "| 4:5 | 1 | \n", - "| 5 and 1:2 | 1 | \n", - "| 5 and 2 and 3:4 | 1 | \n", - "| 5 and 2:3 | 1 | \n", - "| 5 and 3 and 2 | 1 | \n", - "| 5 and 3:4 | 1 | \n", - "| c(1, 2, 3, 5) | 1 | \n", - "| c(2, 4) | 1 | \n", - "\n", - "\n" - ], - "text/plain": [ - " CSs count\n", - "1 5 88 \n", - "2 3 86 \n", - "3 2 82 \n", - "4 1 46 \n", - "5 c(1, 5) 20 \n", - "6 1:2 19 \n", - "7 5 and 2 19 \n", - "8 c(3, 5) 15 \n", - "9 2 and 5 14 \n", - "10 3:4 13 \n", - "11 12 \n", - "12 2:3 9 \n", - "13 3 and 2 8 \n", - "14 5 and 3 8 \n", - "15 2 and 3 7 \n", - "16 3 and 1 5 \n", - "17 3:5 4 \n", - "18 1 and 3 3 \n", - "19 2:4 3 \n", - "20 3 and 5 3 \n", - "21 3 and 5 and 2 3 \n", - "22 5 and 2 and 3 3 \n", - "23 c(1, 3, 5) 3 \n", - "24 1 and 3 and 2 2 \n", - "25 2 and 5 and 3 2 \n", - "26 2 and c(3, 5) 2 \n", - "27 3 and 1:2 2 \n", - "28 3 and 2 and 5 2 \n", - "29 c(1, 2, 4) 2 \n", - "30 1 and 3:4 1 \n", - "31 1:3 1 \n", - "32 1479 and 1:2 1 \n", - "33 2 and 1 1 \n", - "34 2 and 3 and 1 1 \n", - "35 2 and 5 and 3:4 1 \n", - "36 4 1 \n", - "37 4:5 1 \n", - "38 5 and 1:2 1 \n", - "39 5 and 2 and 3:4 1 \n", - "40 5 and 2:3 1 \n", - "41 5 and 3 and 2 1 \n", - "42 5 and 3:4 1 \n", - "43 c(1, 2, 3, 5) 1 \n", - "44 c(2, 4) 1 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "out" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For \"good\" initialization:" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [], - "source": [ - "out2 = out1[which(out1$analyze=='susie_true_init'),]$analyze.cs\n", - "out <- sapply(out2,function (x) paste(as.character(x$cs),collapse = \" and \"))\n", - "out <- table(factor(out))\n", - "out <- sort(out,decreasing = TRUE)\n", - "out <- as.data.frame(out)\n", - "names(out) <- c(\"CSs\",\"count\")" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\t\n", - "\n", - "
CSscount
2 and 3 and 5 75
5 65
2 48
2 and 5 48
3 47
2 and 3 32
3 and 5 29
2 and 5 and 3:4 14
1:2 13
1 11
2 and 3:4 11
3:4 11
c(1, 5) 11
3 and 1:2 10
3:4 and c(1, 5) 8
5 and 1:2 8
5 and 3:4 7
3 and c(1, 5) 5
2 and c(1, 5) 4
3 and 5 and 1:2 4
c(3, 5) 4
1 and 3 3
2 and 3 and c(1, 5) 3
1 and 5 2
1:2 and 3:4 2
2 and 3:5 2
2 and c(3, 5) 2
2:3 2
3:4 and 1:2 2
1 and 3 and 5 1
1 and 3:4 1
1:2 and 4:5 1
1479 and 1:2 and 3:4 1
2 and 3:4 and c(1, 5) 1
2 and 4 and 5 1
3 and 1 1
3:4 and 4:5 1
3:5 1
4 1
4 and 1:2 1
4 and c(1, 5) 1
5 and 1:2 and 3:4 1
5 and 2:3 1
c(1, 2, 3, 5) 1
c(1, 3, 5) 1
c(1, 5) and c(3, 5) 1
\n" - ], - "text/latex": [ - "\\begin{tabular}{r|ll}\n", - " CSs & count\\\\\n", - "\\hline\n", - "\t 2 and 3 and 5 & 75 \\\\\n", - "\t 5 & 65 \\\\\n", - "\t 2 & 48 \\\\\n", - "\t 2 and 5 & 48 \\\\\n", - "\t 3 & 47 \\\\\n", - "\t 2 and 3 & 32 \\\\\n", - "\t 3 and 5 & 29 \\\\\n", - "\t 2 and 5 and 3:4 & 14 \\\\\n", - "\t 1:2 & 13 \\\\\n", - "\t 1 & 11 \\\\\n", - "\t 2 and 3:4 & 11 \\\\\n", - "\t 3:4 & 11 \\\\\n", - "\t c(1, 5) & 11 \\\\\n", - "\t 3 and 1:2 & 10 \\\\\n", - "\t 3:4 and c(1, 5) & 8 \\\\\n", - "\t 5 and 1:2 & 8 \\\\\n", - "\t 5 and 3:4 & 7 \\\\\n", - "\t 3 and c(1, 5) & 5 \\\\\n", - "\t 2 and c(1, 5) & 4 \\\\\n", - "\t 3 and 5 and 1:2 & 4 \\\\\n", - "\t c(3, 5) & 4 \\\\\n", - "\t 1 and 3 & 3 \\\\\n", - "\t 2 and 3 and c(1, 5) & 3 \\\\\n", - "\t 1 and 5 & 2 \\\\\n", - "\t 1:2 and 3:4 & 2 \\\\\n", - "\t 2 and 3:5 & 2 \\\\\n", - "\t 2 and c(3, 5) & 2 \\\\\n", - "\t 2:3 & 2 \\\\\n", - "\t 3:4 and 1:2 & 2 \\\\\n", - "\t 1 and 3 and 5 & 1 \\\\\n", - "\t 1 and 3:4 & 1 \\\\\n", - "\t 1:2 and 4:5 & 1 \\\\\n", - "\t 1479 and 1:2 and 3:4 & 1 \\\\\n", - "\t 2 and 3:4 and c(1, 5) & 1 \\\\\n", - "\t 2 and 4 and 5 & 1 \\\\\n", - "\t 3 and 1 & 1 \\\\\n", - "\t 3:4 and 4:5 & 1 \\\\\n", - "\t 3:5 & 1 \\\\\n", - "\t 4 & 1 \\\\\n", - "\t 4 and 1:2 & 1 \\\\\n", - "\t 4 and c(1, 5) & 1 \\\\\n", - "\t 5 and 1:2 and 3:4 & 1 \\\\\n", - "\t 5 and 2:3 & 1 \\\\\n", - "\t c(1, 2, 3, 5) & 1 \\\\\n", - "\t c(1, 3, 5) & 1 \\\\\n", - "\t c(1, 5) and c(3, 5) & 1 \\\\\n", - "\\end{tabular}\n" - ], - "text/markdown": [ - "\n", - "CSs | count | \n", - "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", - "| 2 and 3 and 5 | 75 | \n", - "| 5 | 65 | \n", - "| 2 | 48 | \n", - "| 2 and 5 | 48 | \n", - "| 3 | 47 | \n", - "| 2 and 3 | 32 | \n", - "| 3 and 5 | 29 | \n", - "| 2 and 5 and 3:4 | 14 | \n", - "| 1:2 | 13 | \n", - "| 1 | 11 | \n", - "| 2 and 3:4 | 11 | \n", - "| 3:4 | 11 | \n", - "| c(1, 5) | 11 | \n", - "| 3 and 1:2 | 10 | \n", - "| 3:4 and c(1, 5) | 8 | \n", - "| 5 and 1:2 | 8 | \n", - "| 5 and 3:4 | 7 | \n", - "| 3 and c(1, 5) | 5 | \n", - "| 2 and c(1, 5) | 4 | \n", - "| 3 and 5 and 1:2 | 4 | \n", - "| c(3, 5) | 4 | \n", - "| 1 and 3 | 3 | \n", - "| 2 and 3 and c(1, 5) | 3 | \n", - "| 1 and 5 | 2 | \n", - "| 1:2 and 3:4 | 2 | \n", - "| 2 and 3:5 | 2 | \n", - "| 2 and c(3, 5) | 2 | \n", - "| 2:3 | 2 | \n", - "| 3:4 and 1:2 | 2 | \n", - "| 1 and 3 and 5 | 1 | \n", - "| 1 and 3:4 | 1 | \n", - "| 1:2 and 4:5 | 1 | \n", - "| 1479 and 1:2 and 3:4 | 1 | \n", - "| 2 and 3:4 and c(1, 5) | 1 | \n", - "| 2 and 4 and 5 | 1 | \n", - "| 3 and 1 | 1 | \n", - "| 3:4 and 4:5 | 1 | \n", - "| 3:5 | 1 | \n", - "| 4 | 1 | \n", - "| 4 and 1:2 | 1 | \n", - "| 4 and c(1, 5) | 1 | \n", - "| 5 and 1:2 and 3:4 | 1 | \n", - "| 5 and 2:3 | 1 | \n", - "| c(1, 2, 3, 5) | 1 | \n", - "| c(1, 3, 5) | 1 | \n", - "| c(1, 5) and c(3, 5) | 1 | \n", - "\n", - "\n" - ], - "text/plain": [ - " CSs count\n", - "1 2 and 3 and 5 75 \n", - "2 5 65 \n", - "3 2 48 \n", - "4 2 and 5 48 \n", - "5 3 47 \n", - "6 2 and 3 32 \n", - "7 3 and 5 29 \n", - "8 2 and 5 and 3:4 14 \n", - "9 1:2 13 \n", - "10 1 11 \n", - "11 2 and 3:4 11 \n", - "12 3:4 11 \n", - "13 c(1, 5) 11 \n", - "14 3 and 1:2 10 \n", - "15 3:4 and c(1, 5) 8 \n", - "16 5 and 1:2 8 \n", - "17 5 and 3:4 7 \n", - "18 3 and c(1, 5) 5 \n", - "19 2 and c(1, 5) 4 \n", - "20 3 and 5 and 1:2 4 \n", - "21 c(3, 5) 4 \n", - "22 1 and 3 3 \n", - "23 2 and 3 and c(1, 5) 3 \n", - "24 1 and 5 2 \n", - "25 1:2 and 3:4 2 \n", - "26 2 and 3:5 2 \n", - "27 2 and c(3, 5) 2 \n", - "28 2:3 2 \n", - "29 3:4 and 1:2 2 \n", - "30 1 and 3 and 5 1 \n", - "31 1 and 3:4 1 \n", - "32 1:2 and 4:5 1 \n", - "33 1479 and 1:2 and 3:4 1 \n", - "34 2 and 3:4 and c(1, 5) 1 \n", - "35 2 and 4 and 5 1 \n", - "36 3 and 1 1 \n", - "37 3:4 and 4:5 1 \n", - "38 3:5 1 \n", - "39 4 1 \n", - "40 4 and 1:2 1 \n", - "41 4 and c(1, 5) 1 \n", - "42 5 and 1:2 and 3:4 1 \n", - "43 5 and 2:3 1 \n", - "44 c(1, 2, 3, 5) 1 \n", - "45 c(1, 3, 5) 1 \n", - "46 c(1, 5) and c(3, 5) 1 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "out" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "R", - "language": "R", - "name": "ir" - }, - "language_info": { - "codemirror_mode": "r", - "file_extension": ".r", - "mimetype": "text/x-r-source", - "name": "R", - "pygments_lexer": "r", - "version": "3.5.1" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/ref_question_dsc/README.md b/ref_question_dsc/README.md deleted file mode 100644 index 91a9468..0000000 --- a/ref_question_dsc/README.md +++ /dev/null @@ -1,9 +0,0 @@ -For details of this experiment see: - -https://stephenslab.github.io/susie-paper/manuscript_results/ref4_question.html - -To run the experiment: - -``` -dsc ref4_question.dsc --host midway2 -``` diff --git a/ref_question_dsc/midway2.yml b/ref_question_dsc/midway2.yml deleted file mode 100644 index 682ce27..0000000 --- a/ref_question_dsc/midway2.yml +++ /dev/null @@ -1,47 +0,0 @@ -DSC: - midway2: - description: UChicago RCC cluster Midway 2 - queue_type: pbs - status_check_interval: 30 - max_running_jobs: 50 - max_cores: 40 - max_walltime: "36:00:00" - max_mem: 64G - task_template: | - #!/bin/bash - #{partition} - #{account} - #SBATCH --time={walltime} - #SBATCH --nodes={nodes} - #SBATCH --cpus-per-task={cores} - #SBATCH --mem={mem//10**9}G - #SBATCH --job-name={job_name} - #SBATCH --output={cur_dir}/{job_name}.out - #SBATCH --error={cur_dir}/{job_name}.err - cd {cur_dir} - module load R 2> /dev/null - partition: "SBATCH --partition=broadwl" - account: "" - submit_cmd: sbatch {job_file} - submit_cmd_output: "Submitted batch job {job_id}" - status_cmd: squeue --job {job_id} - kill_cmd: scancel {job_id} - stephenslab: - based_on: midway2 - max_cores: 28 - max_mem: 128G - max_walltime: "10d" - partition: "SBATCH --partition=mstephens" - account: "SBATCH --account=pi-mstephens" - -default: - queue: midway2 - instances_per_job: 100 - nodes_per_job: 1 - instances_per_node: 8 - cpus_per_instance: 1 - mem_per_instance: 2G - time_per_instance: 5m - -susie: - instances_per_job: 10 diff --git a/ref_question_dsc/ref4_question.dsc b/ref_question_dsc/ref4_question.dsc deleted file mode 100644 index b3c8064..0000000 --- a/ref_question_dsc/ref4_question.dsc +++ /dev/null @@ -1,28 +0,0 @@ -simulate: simulate.R - @CONF: R_libs = (Matrix, MASS) - n: 600 - p: 2000 - b5: 0, 1 - # total PVE by all variables - pve: 0.2 - $X: X - $y: y - $b: b - -susie: R(res <- susieR::susie(X,y,L=L,estimate_prior_method="simple",max_iter=1000)) - @CONF: R_libs = susieR - X: $X - y: $y - L: 8 - $cs: res$sets - $elbo: susieR::susie_get_objective(res) - -susie_true_init(susie): R(s_init = susieR::susie_init_coef(which(b!=0), b[which(b!=0)], ncol(X)); - res <- susieR::susie(X,y,L=L,s_init=s_init,estimate_prior_method="simple",max_iter=1000)) - b: $b - -DSC: - define: - analyze: susie, susie_true_init - run: simulate * analyze - replicate: 500 \ No newline at end of file diff --git a/ref_question_dsc/ref4_question_20200120.rds b/ref_question_dsc/ref4_question_20200120.rds deleted file mode 100644 index c5993a1..0000000 Binary files a/ref_question_dsc/ref4_question_20200120.rds and /dev/null differ diff --git a/ref_question_dsc/ref4_question_20200123.rds b/ref_question_dsc/ref4_question_20200123.rds deleted file mode 100644 index b7ba067..0000000 Binary files a/ref_question_dsc/ref4_question_20200123.rds and /dev/null differ diff --git a/ref_question_dsc/simulate.R b/ref_question_dsc/simulate.R deleted file mode 100644 index 69561d2..0000000 --- a/ref_question_dsc/simulate.R +++ /dev/null @@ -1,17 +0,0 @@ -h = 0.92 -l = 0.7 -cormat = rbind(c(1.0, h, l, l, 0.9), - c(h, 1.0, l, l, 0.7), - c(l, l, 1.0, h, 0.8), - c(l, l, h, 1.0, 0.8), - c(0.9, 0.7, 0.8, 0.8, 1.0)) -covmat = as.matrix(Matrix::nearPD(cormat)$mat) -X = MASS::mvrnorm(n=n, rep(0,nrow(covmat)), covmat) -X = cbind(X, matrix(rnorm(n * (p - ncol(X))), nrow=n, ncol=(p - ncol(X)))) -b = rep(0,p) -b[c(2,3)] = rnorm(2) -b[5] = b5 * rnorm(1) -y = X %*% b -sigma = sqrt(var(y)*(1-pve)/pve) -epsilon = rnorm(n, mean = 0, sd = sigma) -y = y + epsilon \ No newline at end of file