New experimental python interface for sampling. New sampler compariso…

…ns. (#90) * New experimental python interface for sampling. New sampler comparisons. * More details on the rosenbrock_simple.ipynb notebook. Ran black.
NumCosmo · Apr 4, 2023 · 6227927 · 6227927
1 parent 5f02f6d
commit 6227927
Show file tree

Hide file tree

Showing 9 changed files with 828 additions and 19 deletions.
diff --git a/notebooks/apes_tests/rosenbrock_mcmc.ipynb b/notebooks/apes_tests/rosenbrock_mcmc.ipynb
@@ -75,13 +75,13 @@
     "Ncm.cfg_init()\n",
     "Ncm.cfg_set_log_handler(lambda msg: sys.stdout.write(msg) and sys.stdout.flush())\n",
     "\n",
-    "ssize=5000000\n",
+    "ssize = 5000000\n",
     "nthreads = 4\n",
     "nwalkers = 320\n",
     "burnin = 5000\n",
     "thin = 10\n",
     "save_figs = True\n",
-    "verbose=False\n",
+    "verbose = False\n",
     "\n",
     "Ncm.func_eval_set_max_threads(nthreads)\n",
     "Ncm.func_eval_log_pool_stats()"
@@ -98,7 +98,11 @@
     "    sampler=\"apes\", nwalkers=nwalkers, ssize=ssize, verbose=verbose, nthreads=nthreads\n",
     ")\n",
     "catalog_stretch = run_rosenbrock_mcmc(\n",
-    "    sampler=\"stretch\", nwalkers=nwalkers, ssize=ssize, verbose=verbose, nthreads=nthreads\n",
+    "    sampler=\"stretch\",\n",
+    "    nwalkers=nwalkers,\n",
+    "    ssize=ssize,\n",
+    "    verbose=verbose,\n",
+    "    nthreads=nthreads,\n",
     ")"
    ]
   },

diff --git a/notebooks/apes_tests/rosenbrock_simple.ipynb b/notebooks/apes_tests/rosenbrock_simple.ipynb
@@ -0,0 +1,361 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "f99024c2",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "title: \"Samplers comparison\"\n",
+    "format:\n",
+    "  html:\n",
+    "    code-fold: true\n",
+    "jupyer: python3\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4d2ed1e",
+   "metadata": {},
+   "source": [
+    "**License**\n",
+    "\n",
+    " rosenbrock_simple\n",
+    "\n",
+    " Sun Apr 02 21:13:00 2023\\\n",
+    " Copyright  2023\\\n",
+    " Sandro Dias Pinto Vitenti  <vitenti@uel.br>\n",
+    "___\n",
+    " rosenbrock_simple\\\n",
+    " Copyright (C) 2023 Sandro Dias Pinto Vitenti  <vitenti@uel.br>\n",
+    "\n",
+    " numcosmo is free software: you can redistribute it and/or modify it\n",
+    " under the terms of the GNU General Public License as published by the\n",
+    " Free Software Foundation, either version 3 of the License, or\n",
+    " (at your option) any later version.\n",
+    "\n",
+    " numcosmo is distributed in the hope that it will be useful, but\n",
+    " WITHOUT ANY WARRANTY; without even the implied warranty of\n",
+    " MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n",
+    " See the GNU General Public License for more details.\n",
+    "\n",
+    " You should have received a copy of the GNU General Public License along\n",
+    " with this program.  If not, see <http://www.gnu.org/licenses/>."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "regulated-ballet",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "\n",
+    "from numcosmo_py import Ncm\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib.animation import FuncAnimation\n",
+    "from IPython.display import HTML\n",
+    "\n",
+    "import numpy as np\n",
+    "import math\n",
+    "import warnings\n",
+    "\n",
+    "import getdist\n",
+    "from getdist import plots\n",
+    "import pandas as pd\n",
+    "import getdist\n",
+    "from getdist import plots\n",
+    "\n",
+    "\n",
+    "from numcosmo_py.sampling.apes import APES\n",
+    "from numcosmo_py.sampling.catalog import Catalog\n",
+    "from numcosmo_py.plotting.tools import set_rc_params_article\n",
+    "\n",
+    "import emcee\n",
+    "import zeus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "239fecbf",
+   "metadata": {},
+   "source": [
+    "### Initialize NumCosmo and sampling parameters\n",
+    "\n",
+    "In this notebook, we will compare the performance of three different samplers: APES, Emcee, and Zeus. To start, we will use the new and experimental Python interface for NumCosmo.\n",
+    "\n",
+    "Next, we will define the sampling configuration. Our goal is to generate a total of 300,000 points across all three samplers, with each sampler using 300 walkers. This translates to 1000 steps on each sampler to reach our desired sample size. By comparing the performance of these three samplers using the NumCosmo Python interface, we can evaluate their respective capabilities and identify any advantages or limitations for our specific problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "comparative-matthew",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Ncm.cfg_init()\n",
+    "Ncm.cfg_set_log_handler(lambda msg: sys.stdout.write(msg) and sys.stdout.flush())\n",
+    "\n",
+    "ssize = 300000\n",
+    "nwalkers = 300\n",
+    "burin_steps = 50\n",
+    "verbose = False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d4c2f5e",
+   "metadata": {},
+   "source": [
+    "### Probability definition\n",
+    "\n",
+    "In the cell below, we will define the unnormalized Rosenbrock distribution, which is known to be a difficult distribution to sample from. The Rosenbrock distribution is often used as a benchmark for testing the performance of sampling algorithms. By using this challenging distribution, we can better understand how well the samplers perform under difficult sampling conditions.\n",
+    "\n",
+    "We will generate a single initial sample point with random normal realizations having a mean of zero and a standard deviation of one. This initial point will be used as the starting point for each of the samplers, ensuring that they all start at the same point.\n",
+    "\n",
+    "It's worth noting that this initial sample point is very different from a sample from an actual Rosenbrock distribution. However, this is intentional, as we want to see how each sampler performs under challenging conditions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41bd0899",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def log_prob(x, ivar):\n",
+    "    return -0.5 * (100.0 * (x[1] - x[0] * x[0]) ** 2 + (1.0 - x[0]) ** 2) * 1.0e-1\n",
+    "\n",
+    "ndim, nwalkers = 2, nwalkers\n",
+    "p0 = np.random.randn(nwalkers, ndim)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6105a71f",
+   "metadata": {},
+   "source": [
+    "### Runnig NumComo's APES\n",
+    "\n",
+    "In the cell below, we will run NumCosmo's APES algorithm using the configuration defined above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7f40217f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sampler_apes = APES(\n",
+    "    nwalkers=nwalkers, ndim=ndim, model=None, log_prob=log_prob, args=()\n",
+    ")\n",
+    "sampler_apes.run_mcmc(p0, ssize // nwalkers)\n",
+    "mcat_apes = sampler_apes.get_catalog()\n",
+    "mcat_apes.trim(burin_steps)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb1b0cfa",
+   "metadata": {},
+   "source": [
+    "### Running Emcee\n",
+    "\n",
+    "In the cell below, we will run the Emcee algorithm with the same initial point `p0` generated previously. We will generate a chain of samples using Emcee and store the resulting chain in a `Catalog` object. This will allow us to apply the same tests to all algorithms and compare their performance on an even footing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "proof-member",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sampler_emcee = emcee.EnsembleSampler(nwalkers, ndim, log_prob, args=[0])\n",
+    "state_emcee = sampler_emcee.run_mcmc(p0, ssize // nwalkers)\n",
+    "chain_emcee = sampler_emcee.get_chain(flat=True)\n",
+    "log_prob_emcee = sampler_emcee.get_log_prob(flat=True)\n",
+    "mcat_emcee = Catalog(ndim=ndim, nwalkers=nwalkers, run_type=\"EMCEE\")\n",
+    "mcat_emcee.add_points_m2lnp(chain_emcee, -2.0 * log_prob_emcee)\n",
+    "mcat_emcee.trim(burin_steps)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f85f7bfc",
+   "metadata": {},
+   "source": [
+    "### Running Zeus\n",
+    "\n",
+    "In the cell below, we will run the Zeus algorithm with the same initial point `p0` generated previously. We will generate a chain of samples using Zeus and store the resulting chain in a `Catalog` object. One difference from Emcee is that Zeus output chains are not interweaved, so we need to inform the `Catalog` object of this fact to ensure that autocorrelation estimates are correct."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c032d2e7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sampler_zeus = zeus.EnsembleSampler(nwalkers, ndim, log_prob, args=[0], verbose=verbose)\n",
+    "sampler_zeus.run_mcmc(p0, ssize // nwalkers, progress=False)\n",
+    "chain_zeus = sampler_zeus.get_chain(flat=True)\n",
+    "log_prob_zeus = sampler_zeus.get_log_prob(flat=True)\n",
+    "mcat_zeus = Catalog(ndim=ndim, nwalkers=nwalkers, run_type=\"ZEUS\")\n",
+    "mcat_zeus.add_points_m2lnp(chain_zeus, -2.0 * log_prob_zeus, interweaved=False)\n",
+    "mcat_zeus.trim(burin_steps)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89dedca6",
+   "metadata": {},
+   "source": [
+    "### Analyzing results\n",
+    "\n",
+    "In the cell below, we will create two arrays containing the true values for the mean and standard deviation of the Rosenbrock distribution. We will then call `print_status` on each catalog to obtain the current mean, standard deviation, mean standard deviation, and autocorrelation time $\\tau$. Based on the results, we can see that APES has the lowest autocorrelation time $\\tau$ for all parameters, being more than 100 times smaller than the autocorrelation time obtained by Emcee and Zeus. \n",
+    "\n",
+    "This indicates that APES was able to explore the parameter space more efficiently and produce less correlated samples. Additionally, we can see that the mean, standard deviation and mean standard deviation obtained by all samplers are close to the true values of the Rosenbrock distribution. However, it is important to note that the Emcee and Zeus chains may not have fully converged yet, so it is possible that their mean and variance estimates, as well as their autocorrelation times, could improve with further sampling."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "65422ec8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sigma = np.sqrt([10.0, 2401 / 10])\n",
+    "mean = np.array([1.0, 11.0])\n",
+    "\n",
+    "Ncm.cfg_msg_sepa()\n",
+    "mcat_apes.print_status()\n",
+    "Ncm.cfg_msg_sepa()\n",
+    "mcat_emcee.print_status()\n",
+    "Ncm.cfg_msg_sepa()\n",
+    "mcat_zeus.print_status()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdb83a21",
+   "metadata": {},
+   "source": [
+    "### Comparing the mean estimates\n",
+    "\n",
+    "Below, we compare the mean estimates obtained by each sampler and print the relative error on each estimate. As we can see, APES is the only sampler that is able to recover the true means within 1% - 2% error, while Emcee and Zeus exhibit much larger errors. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2647e55a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(np.abs(mcat_apes.get_mean() / mean - 1.0))\n",
+    "print(np.abs(mcat_emcee.get_mean() / mean - 1.0))\n",
+    "print(np.abs(mcat_zeus.get_mean() / mean - 1.0))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09c066c2",
+   "metadata": {},
+   "source": [
+    "### Comparing the standard deviation estimates\n",
+    "\n",
+    "Similar to the previous comparison, we now look at the relative error on the standard deviation estimates. Once again, APES performs the best among the samplers, with estimates within 1% - 4% of the true values. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f49498f3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(np.abs(np.sqrt(np.diagonal(mcat_apes.get_covar())) / sigma - 1.0))\n",
+    "print(np.abs(np.sqrt(np.diagonal(mcat_emcee.get_covar())) / sigma - 1.0))\n",
+    "print(np.abs(np.sqrt(np.diagonal(mcat_zeus.get_covar())) / sigma - 1.0))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d28813e",
+   "metadata": {},
+   "source": [
+    "### Using `getdist` on the chains\n",
+    "\n",
+    "In this next cell, we convert the catalog of MCMC chains for each algorithm into `MCSamples` objects using the `getdist` package. This allows us to easily plot the joint and marginal posterior distributions of the parameters using the `getdist` `plotter`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b0bdd50f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mcs_apes = mcat_apes.get_mcsamples(\"APES\")\n",
+    "mcs_emcee = mcat_emcee.get_mcsamples(\"EMCEE\")\n",
+    "mcs_zeus = mcat_zeus.get_mcsamples(\"ZEUS\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fdf7fb9",
+   "metadata": {},
+   "source": [
+    "### Corner plot\n",
+    "\n",
+    "Finally, we create a joint corner plot of the samples using `getdist`. We can see that the other samplers (Emcee and Zeus) have more difficulty in sampling the tails, leading to the 1D marginals being more concentrated and reporting smaller variance. This is reflected in the larger relative errors in the mean and standard deviation estimates we saw earlier."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "88d0e2d8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "set_rc_params_article(ncol=2)\n",
+    "\n",
+    "g = getdist.plots.get_subplot_plotter(width_inch=plt.rcParams[\"figure.figsize\"][0])\n",
+    "g.settings.linewidth = 0.01\n",
+    "g.triangle_plot([mcs_apes, mcs_emcee, mcs_zeus], shaded=True)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "767d51c1340bd893661ea55ea3124f6de3c7a262a8b4abca0554b478b1e2ff90"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}