Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent return inferencedata error + cheater/liar fix #536

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Chapter2_MorePyMC/Ch2_MorePyMC_PyMC2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1130,7 +1130,7 @@
"\n",
"> In the interview process for each student, the student flips a coin, hidden from the interviewer. The student agrees to answer honestly if the coin comes up heads. Otherwise, if the coin comes up tails, the student (secretly) flips the coin again, and answers \"Yes, I did cheat\" if the coin flip lands heads, and \"No, I did not cheat\", if the coin flip lands tails. This way, the interviewer does not know if a \"Yes\" was the result of a guilty plea, or a Heads on a second coin toss. Thus privacy is preserved and the researchers receive honest answers. \n",
"\n",
"I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some *Yes*'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of their original dataset since half of the responses will be noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use PyMC to dig through this noisy model, and find a posterior distribution for the true frequency of liars. "
"I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some *Yes*'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of their original dataset since half of the responses will be noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use PyMC to dig through this noisy model, and find a posterior distribution for the true frequency of cheaters. "
]
},
{
Expand Down
12 changes: 6 additions & 6 deletions Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,7 @@
" obs = pm.Bernoulli(\"obs\", p, observed=occurrences)\n",
" # To be explained in chapter 3\n",
" step = pm.Metropolis()\n",
" trace = pm.sample(18000, step=step)\n",
" trace = pm.sample(18000, step=step, return_inferencedata=False)\n",
" burned_trace = trace[1000:]"
]
},
Expand Down Expand Up @@ -998,7 +998,7 @@
"\n",
" # To be explained in chapter 3.\n",
" step = pm.Metropolis()\n",
" trace = pm.sample(20000, step=step)\n",
" trace = pm.sample(20000, step=step, return_inferencedata=False)\n",
" burned_trace=trace[1000:]"
]
},
Expand Down Expand Up @@ -1200,7 +1200,7 @@
"\n",
"> In the interview process for each student, the student flips a coin, hidden from the interviewer. The student agrees to answer honestly if the coin comes up heads. Otherwise, if the coin comes up tails, the student (secretly) flips the coin again, and answers \"Yes, I did cheat\" if the coin flip lands heads, and \"No, I did not cheat\", if the coin flip lands tails. This way, the interviewer does not know if a \"Yes\" was the result of a guilty plea, or a Heads on a second coin toss. Thus privacy is preserved and the researchers receive honest answers. \n",
"\n",
"I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some *Yes*'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of their original dataset since half of the responses will be noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use PyMC3 to dig through this noisy model, and find a posterior distribution for the true frequency of liars. "
"I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some *Yes*'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of their original dataset since half of the responses will be noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use PyMC3 to dig through this noisy model, and find a posterior distribution for the true frequency of cheaters. "
]
},
{
Expand Down Expand Up @@ -1403,7 +1403,7 @@
"# To be explained in Chapter 3!\n",
"with model:\n",
" step = pm.Metropolis(vars=[p])\n",
" trace = pm.sample(40000, step=step)\n",
" trace = pm.sample(40000, step=step, return_inferencedata=False)\n",
" burned_trace = trace[15000:]"
]
},
Expand Down Expand Up @@ -1534,7 +1534,7 @@
"with model:\n",
" # To Be Explained in Chapter 3!\n",
" step = pm.Metropolis()\n",
" trace = pm.sample(25000, step=step)\n",
" trace = pm.sample(25000, step=step, return_inferencedata=False)\n",
" burned_trace = trace[2500:]"
]
},
Expand Down Expand Up @@ -1929,7 +1929,7 @@
" # Mysterious code to be explained in Chapter 3\n",
" start = pm.find_MAP()\n",
" step = pm.Metropolis()\n",
" trace = pm.sample(120000, step=step, start=start)\n",
" trace = pm.sample(120000, step=step, start=start, return_inferencedata=False)\n",
" burned_trace = trace[100000::2]"
]
},
Expand Down
2 changes: 1 addition & 1 deletion Chapter2_MorePyMC/Ch2_MorePyMC_TFP.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2082,7 +2082,7 @@
"\n",
"> In the interview process for each student, the student flips a coin, hidden from the interviewer. The student agrees to answer honestly if the coin comes up heads. Otherwise, if the coin comes up tails, the student (secretly) flips the coin again, and answers \"Yes, I did cheat\" if the coin flip lands heads, and \"No, I did not cheat\", if the coin flip lands tails. This way, the interviewer does not know if a \"Yes\" was the result of a guilty plea, or a Heads on a second coin toss. Thus privacy is preserved and the researchers receive honest answers. \n",
"\n",
"I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some *Yes*'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of their original dataset since half of the responses will be noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use TFP to dig through this noisy model, and find a posterior distribution for the true frequency of liars. "
"I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some *Yes*'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of their original dataset since half of the responses will be noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use TFP to dig through this noisy model, and find a posterior distribution for the true frequency of cheaters. "
]
},
{
Expand Down
10 changes: 5 additions & 5 deletions Chapter3_MCMC/Ch3_IntroMCMC_PyMC3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@
"with model:\n",
" step1 = pm.Metropolis(vars=[p, sds, centers])\n",
" step2 = pm.ElemwiseCategorical(vars=[assignment])\n",
" trace = pm.sample(25000, step=[step1, step2])"
" trace = pm.sample(25000, step=[step1, step2], return_inferencedata=False)"
]
},
{
Expand Down Expand Up @@ -582,7 +582,7 @@
],
"source": [
"with model:\n",
" trace = pm.sample(50000, step=[step1, step2], trace=trace)"
" trace = pm.sample(50000, step=[step1, step2], trace=trace, return_inferencedata=False)"
]
},
{
Expand Down Expand Up @@ -846,7 +846,7 @@
" x = pm.Normal(\"x\", mu=4, tau=10)\n",
" y = pm.Deterministic(\"y\", 10 - x)\n",
"\n",
" trace_2 = pm.sample(10000, pm.Metropolis())\n",
" trace_2 = pm.sample(10000, pm.Metropolis(), return_inferencedata=False)\n",
"\n",
"plt.plot(trace_2[\"x\"])\n",
"plt.plot(trace_2[\"y\"])\n",
Expand Down Expand Up @@ -941,7 +941,7 @@
"Of course, we do not know where the MAP is. PyMC3 provides a function that will approximate, if not find, the MAP location. In the PyMC3 main namespace is the `find_MAP` function. If you call this function within the context of `Model()`, it will calculate the MAP which you can then pass to `pm.sample()` as a `start` parameter.\n",
"\n",
" start = pm.find_MAP()\n",
" trace = pm.sample(2000, step=pm.Metropolis, start=start)\n",
" trace = pm.sample(2000, step=pm.Metropolis, start=start, return_inferencedata=False)\n",
"\n",
"The `find_MAP()` function has the flexibility of allowing the user to choose which optimization algorithm to use (after all, this is a optimization problem: we are looking for the values that maximize our landscape), as not all optimization algorithms are created equal. The default optimization algorithm in function call is the Broyden-Fletcher-Goldfarb-Shanno ([BFGS](https://en.wikipedia.org/wiki/Broyden-Fletcher-Goldfarb-Shanno_algorithm)) algorithm to find the maximum of the log-posterior. As an alternative, you can use other optimization algorithms from the `scipy.optimize` module. For example, you can use Powell's Method, a favourite of PyMC blogger [Abraham Flaxman](http://healthyalgorithms.com/) [1], by calling `find_MAP(fmin=scipy.optimize.fmin_powell)`. The default works well enough, but if convergence is slow or not guaranteed, feel free to experiment with Powell's method or the other algorithms available. \n",
"\n",
Expand All @@ -955,7 +955,7 @@
" start = pm.find_MAP()\n",
" \n",
" step = pm.Metropolis()\n",
" trace = pm.sample(100000, step=step, start=start)\n",
" trace = pm.sample(100000, step=step, start=start, return_inferencedata=False)\n",
" \n",
" burned_trace = trace[50000:]\n"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,7 @@
" upvote_ratio = pm.Uniform(\"upvote_ratio\", 0, 1)\n",
" observations = pm.Binomial( \"obs\", N, upvote_ratio, observed=upvotes)\n",
" \n",
" trace = pm.sample(samples, step=pm.Metropolis())\n",
" trace = pm.sample(samples, step=pm.Metropolis(), return_inferencedata=False)\n",
" \n",
" burned_trace = trace[int(samples/4):]\n",
" return burned_trace[\"upvote_ratio\"]\n",
Expand Down
8 changes: 4 additions & 4 deletions Chapter5_LossFunctions/Ch5_LossFunctions_PyMC3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@
" error = pm.Potential(\"error\", logp)\n",
" \n",
"\n",
" trace = pm.sample(50000, step=pm.Metropolis())\n",
" trace = pm.sample(50000, step=pm.Metropolis(), return_inferencedata=False)\n",
" burned_trace = trace[10000:]\n",
"\n",
"price_trace = burned_trace[\"true_price\"]"
Expand Down Expand Up @@ -655,7 +655,7 @@
" \n",
" obs = pm.Normal(\"obs\", mu=mean, sd=std, observed=Y)\n",
" \n",
" trace = pm.sample(100000, step=pm.Metropolis())\n",
" trace = pm.sample(100000, step=pm.Metropolis(), return_inferencedata=False)\n",
" burned_trace = trace[20000:] "
]
},
Expand Down Expand Up @@ -1000,7 +1000,7 @@
"with model:\n",
" mu, sds, elbo = pm.variational.advi(n=50000)\n",
" step = pm.NUTS(scaling=model.dict_to_array(sds), is_cov=True)\n",
" trace = pm.sample(5000, step=step, start=mu)"
" trace = pm.sample(5000, step=step, start=mu, return_inferencedata=False)"
]
},
{
Expand Down Expand Up @@ -1247,7 +1247,7 @@
" \n",
" mu, sds, elbo = pm.variational.advi(n=50000)\n",
" step = pm.NUTS(scaling=model.dict_to_array(sds), is_cov=True)\n",
" trace = pm.sample(samples, step=step, start=mu)\n",
" trace = pm.sample(samples, step=step, start=mu, return_inferencedata=False)\n",
" \n",
" burned_trace = trace[burn_in:]\n",
" return burned_trace[\"halo_positions\"]"
Expand Down
2 changes: 1 addition & 1 deletion Chapter6_Priorities/Ch6_Priors_PyMC3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1229,7 +1229,7 @@
"with model:\n",
" obs = pm.MvNormal(\"observed returns\", mu=mu, cov=cov_matrix, observed=stock_returns)\n",
" step = pm.NUTS()\n",
" trace = pm.sample(5000, step=step)"
" trace = pm.sample(5000, step=step, return_inferencedata=False)"
]
},
{
Expand Down