DOC: Fix many small errors in examples

Fix many small issues in docs Install missing R packages
bashtage · Apr 30, 2019 · 8233beb · 8233beb
1 parent 303747f
commit 8233beb
Show file tree

Hide file tree

Showing 11 changed files with 56 additions and 71 deletions.
diff --git a/examples/notebooks/discrete_choice_example.ipynb b/examples/notebooks/discrete_choice_example.ipynb
@@ -162,9 +162,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "resp = dict(zip(range(1,9), respondent1000[[\"occupation\", \"educ\", \n",
-    "                                            \"occupation_husb\", \"rate_marriage\", \n",
-    "                                            \"age\", \"yrs_married\", \"children\", \n",
+    "resp = dict(zip(range(1,9), respondent1000[[\"occupation\", \"educ\",\n",
+    "                                            \"occupation_husb\", \"rate_marriage\",\n",
+    "                                            \"age\", \"yrs_married\", \"children\",\n",
     "                                            \"religious\"]].tolist()))\n",
     "resp.update({0 : 1})\n",
     "print(resp)"
@@ -365,7 +365,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from scipy.misc import comb\n",
+    "from scipy.special import comb\n",
     "comb(5,2) * (1/6.)**2 * (5/6.)**3"
    ]
   },
@@ -531,7 +531,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Pearson residuals are defined to be \n",
+    "Pearson residuals are defined to be\n",
     "\n",
     "$$\\frac{(y - \\mu)}{\\sqrt{(var(\\mu))}}$$\n",
     "\n",
@@ -563,7 +563,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The definition of the deviance residuals depends on the family. For the Binomial distribution this is \n",
+    "The definition of the deviance residuals depends on the family. For the Binomial distribution this is\n",
     "\n",
     "$$r_{dev} = sign\\left(Y-\\mu\\right)*\\sqrt{2n(Y\\log\\frac{Y}{\\mu}+(1-Y)\\log\\frac{(1-Y)}{(1-\\mu)}}$$\n",
     "\n",
@@ -577,7 +577,7 @@
    "outputs": [],
    "source": [
     "resid = glm_mod.resid_deviance\n",
-    "resid_std = stats.zscore(resid) \n",
+    "resid_std = stats.zscore(resid)\n",
     "kde_resid = sm.nonparametric.KDEUnivariate(resid_std)\n",
     "kde_resid.fit()"
    ]
@@ -590,7 +590,7 @@
    "source": [
     "fig = plt.figure(figsize=(12,8))\n",
     "ax = fig.add_subplot(111, title=\"Standardized Deviance Residuals\")\n",
-    "ax.hist(resid_std, bins=25, normed=True);\n",
+    "ax.hist(resid_std, bins=25, density=True);\n",
     "ax.plot(kde_resid.support, kde_resid.density, 'r');"
    ]
   },

diff --git a/examples/notebooks/discrete_choice_overview.ipynb b/examples/notebooks/discrete_choice_overview.ipynb
@@ -37,7 +37,7 @@
    },
    "outputs": [],
    "source": [
-    "spector_data = sm.datasets.spector.load()\n",
+    "spector_data = sm.datasets.spector.load(as_pandas=False)\n",
     "spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)"
    ]
   },
@@ -90,9 +90,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "logit_mod = sm.Logit(spector_data.endog, spector_data.exog)\n",
@@ -182,7 +180,7 @@
    },
    "outputs": [],
    "source": [
-    "anes_data = sm.datasets.anes96.load()\n",
+    "anes_data = sm.datasets.anes96.load(as_pandas=False)\n",
     "anes_exog = anes_data.exog\n",
     "anes_exog = sm.add_constant(anes_exog, prepend=False)"
    ]
@@ -243,7 +241,7 @@
    },
    "outputs": [],
    "source": [
-    "rand_data = sm.datasets.randhie.load()\n",
+    "rand_data = sm.datasets.randhie.load(as_pandas=False)\n",
     "rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)\n",
     "rand_exog = sm.add_constant(rand_exog, prepend=False)"
    ]

diff --git a/examples/notebooks/glm.ipynb b/examples/notebooks/glm.ipynb
@@ -66,7 +66,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "data = sm.datasets.star98.load()\n",
+    "data = sm.datasets.star98.load(as_pandas=False)\n",
     "data.exog = sm.add_constant(data.exog, prepend=False)"
    ]
   },

diff --git a/examples/notebooks/gls.ipynb b/examples/notebooks/gls.ipynb
@@ -36,7 +36,7 @@
    },
    "outputs": [],
    "source": [
-    "data = sm.datasets.longley.load()\n",
+    "data = sm.datasets.longley.load(as_pandas=False)\n",
     "data.exog = sm.add_constant(data.exog)\n",
     "print(data.exog[:5])"
    ]
@@ -73,7 +73,7 @@
     "$\\epsilon_i = \\beta_0 + \\rho\\epsilon_{i-1} + \\eta_i$\n",
     "\n",
     "where $\\eta \\sim N(0,\\Sigma^2)$\n",
-    " \n",
+    "\n",
     "and that $\\rho$ is simply the correlation of the residual a consistent estimator for rho is to regress the residuals on the lagged residuals"
    ]
   },
@@ -152,9 +152,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "sigma = rho**order\n",
@@ -166,17 +164,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Of course, the exact rho in this instance is not known so it it might make more sense to use feasible gls, which currently only has experimental support. \n",
+    "Of course, the exact rho in this instance is not known so it it might make more sense to use feasible gls, which currently only has experimental support.\n",
     "\n",
     "We can use the GLSAR model with one lag, to get to a similar result:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "glsar_model = sm.GLSAR(data.endog, data.exog, 1)\n",
@@ -198,9 +194,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "print(gls_results.params)\n",
@@ -226,9 +220,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.4.3"
+   "version": "3.6.6"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 0
+ "nbformat_minor": 1
 }
diff --git a/examples/notebooks/interactions_anova.ipynb b/examples/notebooks/interactions_anova.ipynb
@@ -363,16 +363,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "try:\n",
-    "    resid = interM_lm32.get_influence().summary_frame()['standard_resid']\n",
-    "except:\n",
-    "    resid = interM_lm32.get_influence().summary_frame()['standard_resid']\n",
+    "resid = interM_lm32.get_influence().summary_frame()['standard_resid']\n",
     "\n",
     "plt.figure(figsize=(6,6))\n",
+    "resid = resid.reindex(X.index)\n",
     "for values, group in factor_groups:\n",
     "    i,j = values\n",
     "    idx = group.index\n",
-    "    plt.scatter(X[idx], resid[idx], marker=symbols[j], color=colors[i-1],\n",
+    "    plt.scatter(X.loc[idx], resid.loc[idx], marker=symbols[j], color=colors[i-1],\n",
     "            s=144, edgecolors='black')\n",
     "plt.xlabel('X[~[32]]');\n",
     "plt.ylabel('standardized resids');"
@@ -402,8 +400,9 @@
     "    plt.scatter(X[idx], S[idx], marker=symbols[j], color=colors[i-1],\n",
     "                s=144, edgecolors='black')\n",
     "    # drop NA because there is no idx 32 in the final model\n",
-    "    plt.plot(mf.X[idx].dropna(), lm_final.fittedvalues[idx].dropna(),\n",
-    "            ls=lstyle[j], color=colors[i-1])\n",
+    "    fv = lm_final.fittedvalues.reindex(idx).dropna()\n",
+    "    x = mf.X.reindex(idx).dropna()\n",
+    "    plt.plot(x, fv, ls=lstyle[j], color=colors[i-1])\n",
     "plt.xlabel('Experience');\n",
     "plt.ylabel('Salary');"
    ]

diff --git a/examples/notebooks/kernel_density.ipynb b/examples/notebooks/kernel_density.ipynb
@@ -71,7 +71,7 @@
     "dist2_loc, dist2_scale, weight2 = 1 , .5, .75\n",
     "\n",
     "# Sample from a mixture of distributions\n",
-    "obs_dist = mixture_rvs(prob=[weight1, weight2], size=250, \n",
+    "obs_dist = mixture_rvs(prob=[weight1, weight2], size=250,\n",
     "                        dist=[stats.norm, stats.norm],\n",
     "                        kwargs = (dict(loc=dist1_loc, scale=dist1_scale),\n",
     "                                  dict(loc=dist2_loc, scale=dist2_scale)))"
@@ -94,7 +94,7 @@
     "ax = fig.add_subplot(111)\n",
     "\n",
     "# Scatter plot of data samples and histogram\n",
-    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size)), \n",
+    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size)),\n",
     "            zorder=15, color='red', marker='x', alpha=0.5, label='Samples')\n",
     "lines = ax.hist(obs_dist, bins=20, edgecolor='k', label='Histogram')\n",
     "\n",
@@ -146,7 +146,7 @@
     "ax = fig.add_subplot(111)\n",
     "\n",
     "# Plot the histrogram\n",
-    "ax.hist(obs_dist, bins=20, normed=True, label='Histogram from samples', \n",
+    "ax.hist(obs_dist, bins=20, density=True, label='Histogram from samples',\n",
     "        zorder=5, edgecolor='k', alpha=0.5)\n",
     "\n",
     "# Plot the KDE as fitted using the default arguments\n",
@@ -158,7 +158,7 @@
     "ax.plot(kde.support, true_values, lw=3, label='True distribution', zorder=15)\n",
     "\n",
     "# Plot the samples\n",
-    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/40, \n",
+    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/40,\n",
     "           marker='x', color='red', zorder=20, label='Samples', alpha=0.5)\n",
     "\n",
     "ax.legend(loc='best')\n",
@@ -197,8 +197,8 @@
     "ax = fig.add_subplot(111)\n",
     "\n",
     "# Plot the histrogram\n",
-    "ax.hist(obs_dist, bins=25, label='Histogram from samples', \n",
-    "        zorder=5, edgecolor='k', normed=True, alpha=0.5)\n",
+    "ax.hist(obs_dist, bins=25, label='Histogram from samples',\n",
+    "        zorder=5, edgecolor='k', density=True, alpha=0.5)\n",
     "\n",
     "# Plot the KDE for various bandwidths\n",
     "for bandwidth in [0.1, 0.2, 0.4]:\n",
@@ -210,7 +210,7 @@
     "ax.plot(kde.support, true_values, lw=3, label='True distribution', zorder=15)\n",
     "\n",
     "# Plot the samples\n",
-    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50, \n",
+    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50,\n",
     "           marker='x', color='red', zorder=20, label='Data samples', alpha=0.5)\n",
     "\n",
     "ax.legend(loc='best')\n",
@@ -260,10 +260,10 @@
     "\n",
     "# Enumerate every option for the kernel\n",
     "for i, (ker_name, ker_class) in enumerate(kernel_switch.items()):\n",
-    "    \n",
+    "\n",
     "    # Initialize the kernel object\n",
     "    kernel = ker_class()\n",
-    "    \n",
+    "\n",
     "    # Sample from the domain\n",
     "    domain = kernel.domain or [-3, 3]\n",
     "    x_vals = np.linspace(*domain, num=2**10)\n",
@@ -276,7 +276,7 @@
     "    ax.scatter([0], [0], marker='x', color='red')\n",
     "    plt.grid(True, zorder=-5)\n",
     "    ax.set_xlim(domain)\n",
-    "    \n",
+    "\n",
     "plt.tight_layout()"
    ]
   },
@@ -309,20 +309,20 @@
     "\n",
     "# Enumerate every option for the kernel\n",
     "for i, kernel in enumerate(kernel_switch.keys()):\n",
-    "    \n",
+    "\n",
     "    # Create a subplot, set the title\n",
     "    ax = fig.add_subplot(2, 4, i + 1)\n",
     "    ax.set_title('Kernel function \"{}\"'.format(kernel))\n",
-    "    \n",
+    "\n",
     "    # Fit the model (estimate densities)\n",
     "    kde.fit(kernel=kernel, fft=False, gridsize=2**10)\n",
-    "    \n",
+    "\n",
     "    # Create the plot\n",
     "    ax.plot(kde.support, kde.density, lw=3, label='KDE from samples', zorder=10)\n",
     "    ax.scatter(data, np.zeros_like(data), marker='x', color='red')\n",
     "    plt.grid(True, zorder=-5)\n",
     "    ax.set_xlim([-3, 3])\n",
-    "    \n",
+    "\n",
     "plt.tight_layout()"
    ]
   },
@@ -363,10 +363,10 @@
    "source": [
     "fig = plt.figure(figsize=(12, 5))\n",
     "ax = fig.add_subplot(111)\n",
-    "ax.hist(obs_dist, bins=20, normed=True, edgecolor='k', zorder=4, alpha=0.5)\n",
+    "ax.hist(obs_dist, bins=20, density=True, edgecolor='k', zorder=4, alpha=0.5)\n",
     "ax.plot(kde.support, kde.density, lw=3, zorder=7)\n",
     "# Plot the samples\n",
-    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50, \n",
+    "ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50,\n",
     "           marker='x', color='red', zorder=20, label='Data samples', alpha=0.5)\n",
     "ax.grid(True, zorder=-5)"
    ]

diff --git a/examples/notebooks/mixed_lm_example.ipynb b/examples/notebooks/mixed_lm_example.ipynb
@@ -16,6 +16,7 @@
     "%matplotlib inline\n",
     "\n",
     "import numpy as np\n",
+    "import pandas as pd\n",
     "import statsmodels.api as sm\n",
     "import statsmodels.formula.api as smf"
    ]
@@ -47,7 +48,7 @@
     "\n",
     "The Statsmodels imputation of linear mixed models (MixedLM) closely follows the approach outlined in Lindstrom and Bates (JASA 1988).  This is also the approach followed in the  R package LME4.  Other packages such as Stata, SAS, etc. should also be consistent with this approach, as the basic techniques in this area are mostly mature.\n",
     "\n",
-    "Here we show how linear mixed models can be fit using the MixedLM procedure in Statsmodels.  Results from R (LME4) are included for comparison.  \n",
+    "Here we show how linear mixed models can be fit using the MixedLM procedure in Statsmodels.  Results from R (LME4) are included for comparison.\n",
     "\n",
     "Here are our import statements:"
    ]
@@ -86,7 +87,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%%R \n",
+    "%%R\n",
     "data(dietox, package='geepack')"
    ]
   },
@@ -161,7 +162,7 @@
    "source": [
     "md = smf.mixedlm(\"Weight ~ Time\", data, groups=data[\"Pig\"],\n",
     "                  re_formula=\"~Time\")\n",
-    "free = sm.regression.mixed_linear_model.MixedLMParams.from_components(np.ones(2), \n",
+    "free = sm.regression.mixed_linear_model.MixedLMParams.from_components(np.ones(2),\n",
     "                                                                      np.eye(2))\n",
     "\n",
     "mdf = md.fit(free=free)\n",
@@ -172,7 +173,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The likelihood drops by 0.3 when we fix the correlation parameter to 0.  Comparing 2 x 0.3 = 0.6 to the chi^2 1 df reference distribution suggests that the data are very consistent with a model in which this parameter is equal to 0.  \n",
+    "The likelihood drops by 0.3 when we fix the correlation parameter to 0.  Comparing 2 x 0.3 = 0.6 to the chi^2 1 df reference distribution suggests that the data are very consistent with a model in which this parameter is equal to 0.\n",
     "\n",
     "Here is the same model fit using LMER in R (note that here R is reporting the REML criterion instead of the likelihood, where the REML criterion is twice the log likeihood):"
    ]
@@ -347,13 +348,6 @@
     "plt.xlabel(\"Variance of random slope\", size=17)\n",
     "plt.ylabel(\"-2 times profile log likelihood\", size=17)"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

diff --git a/examples/notebooks/regression_diagnostics.ipynb b/examples/notebooks/regression_diagnostics.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This example file shows how to use a few of the ``statsmodels`` regression diagnostic tests in a real-life context. You can learn about more tests and find out more information about the tests here on the [Regression Diagnostics page.](https://www.statsmodels.org/stable/diagnostic.html) \n",
+    "This example file shows how to use a few of the ``statsmodels`` regression diagnostic tests in a real-life context. You can learn about more tests and find out more information about the tests here on the [Regression Diagnostics page.](https://www.statsmodels.org/stable/diagnostic.html)\n",
     "\n",
     "Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online ``statsmodels`` documentation. For presentation purposes, we use the ``zip(name,test)`` construct to pretty-print short descriptions in the examples below."
    ]