Skip to content

Commit

Permalink
DOC: Fix many small errors in examples
Browse files Browse the repository at this point in the history
Fix many small issues in docs
Install missing R packages
  • Loading branch information
bashtage committed Apr 30, 2019
1 parent 303747f commit 8233beb
Show file tree
Hide file tree
Showing 11 changed files with 56 additions and 71 deletions.
16 changes: 8 additions & 8 deletions examples/notebooks/discrete_choice_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -162,9 +162,9 @@
"metadata": {},
"outputs": [],
"source": [
"resp = dict(zip(range(1,9), respondent1000[[\"occupation\", \"educ\", \n",
" \"occupation_husb\", \"rate_marriage\", \n",
" \"age\", \"yrs_married\", \"children\", \n",
"resp = dict(zip(range(1,9), respondent1000[[\"occupation\", \"educ\",\n",
" \"occupation_husb\", \"rate_marriage\",\n",
" \"age\", \"yrs_married\", \"children\",\n",
" \"religious\"]].tolist()))\n",
"resp.update({0 : 1})\n",
"print(resp)"
Expand Down Expand Up @@ -365,7 +365,7 @@
"metadata": {},
"outputs": [],
"source": [
"from scipy.misc import comb\n",
"from scipy.special import comb\n",
"comb(5,2) * (1/6.)**2 * (5/6.)**3"
]
},
Expand Down Expand Up @@ -531,7 +531,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Pearson residuals are defined to be \n",
"Pearson residuals are defined to be\n",
"\n",
"$$\\frac{(y - \\mu)}{\\sqrt{(var(\\mu))}}$$\n",
"\n",
Expand Down Expand Up @@ -563,7 +563,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The definition of the deviance residuals depends on the family. For the Binomial distribution this is \n",
"The definition of the deviance residuals depends on the family. For the Binomial distribution this is\n",
"\n",
"$$r_{dev} = sign\\left(Y-\\mu\\right)*\\sqrt{2n(Y\\log\\frac{Y}{\\mu}+(1-Y)\\log\\frac{(1-Y)}{(1-\\mu)}}$$\n",
"\n",
Expand All @@ -577,7 +577,7 @@
"outputs": [],
"source": [
"resid = glm_mod.resid_deviance\n",
"resid_std = stats.zscore(resid) \n",
"resid_std = stats.zscore(resid)\n",
"kde_resid = sm.nonparametric.KDEUnivariate(resid_std)\n",
"kde_resid.fit()"
]
Expand All @@ -590,7 +590,7 @@
"source": [
"fig = plt.figure(figsize=(12,8))\n",
"ax = fig.add_subplot(111, title=\"Standardized Deviance Residuals\")\n",
"ax.hist(resid_std, bins=25, normed=True);\n",
"ax.hist(resid_std, bins=25, density=True);\n",
"ax.plot(kde_resid.support, kde_resid.density, 'r');"
]
},
Expand Down
10 changes: 4 additions & 6 deletions examples/notebooks/discrete_choice_overview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
},
"outputs": [],
"source": [
"spector_data = sm.datasets.spector.load()\n",
"spector_data = sm.datasets.spector.load(as_pandas=False)\n",
"spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)"
]
},
Expand Down Expand Up @@ -90,9 +90,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"logit_mod = sm.Logit(spector_data.endog, spector_data.exog)\n",
Expand Down Expand Up @@ -182,7 +180,7 @@
},
"outputs": [],
"source": [
"anes_data = sm.datasets.anes96.load()\n",
"anes_data = sm.datasets.anes96.load(as_pandas=False)\n",
"anes_exog = anes_data.exog\n",
"anes_exog = sm.add_constant(anes_exog, prepend=False)"
]
Expand Down Expand Up @@ -243,7 +241,7 @@
},
"outputs": [],
"source": [
"rand_data = sm.datasets.randhie.load()\n",
"rand_data = sm.datasets.randhie.load(as_pandas=False)\n",
"rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)\n",
"rand_exog = sm.add_constant(rand_exog, prepend=False)"
]
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/glm.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
"metadata": {},
"outputs": [],
"source": [
"data = sm.datasets.star98.load()\n",
"data = sm.datasets.star98.load(as_pandas=False)\n",
"data.exog = sm.add_constant(data.exog, prepend=False)"
]
},
Expand Down
22 changes: 8 additions & 14 deletions examples/notebooks/gls.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
},
"outputs": [],
"source": [
"data = sm.datasets.longley.load()\n",
"data = sm.datasets.longley.load(as_pandas=False)\n",
"data.exog = sm.add_constant(data.exog)\n",
"print(data.exog[:5])"
]
Expand Down Expand Up @@ -73,7 +73,7 @@
"$\\epsilon_i = \\beta_0 + \\rho\\epsilon_{i-1} + \\eta_i$\n",
"\n",
"where $\\eta \\sim N(0,\\Sigma^2)$\n",
" \n",
"\n",
"and that $\\rho$ is simply the correlation of the residual a consistent estimator for rho is to regress the residuals on the lagged residuals"
]
},
Expand Down Expand Up @@ -152,9 +152,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"sigma = rho**order\n",
Expand All @@ -166,17 +164,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, the exact rho in this instance is not known so it it might make more sense to use feasible gls, which currently only has experimental support. \n",
"Of course, the exact rho in this instance is not known so it it might make more sense to use feasible gls, which currently only has experimental support.\n",
"\n",
"We can use the GLSAR model with one lag, to get to a similar result:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"glsar_model = sm.GLSAR(data.endog, data.exog, 1)\n",
Expand All @@ -198,9 +194,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"print(gls_results.params)\n",
Expand All @@ -226,9 +220,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
13 changes: 6 additions & 7 deletions examples/notebooks/interactions_anova.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -363,16 +363,14 @@
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" resid = interM_lm32.get_influence().summary_frame()['standard_resid']\n",
"except:\n",
" resid = interM_lm32.get_influence().summary_frame()['standard_resid']\n",
"resid = interM_lm32.get_influence().summary_frame()['standard_resid']\n",
"\n",
"plt.figure(figsize=(6,6))\n",
"resid = resid.reindex(X.index)\n",
"for values, group in factor_groups:\n",
" i,j = values\n",
" idx = group.index\n",
" plt.scatter(X[idx], resid[idx], marker=symbols[j], color=colors[i-1],\n",
" plt.scatter(X.loc[idx], resid.loc[idx], marker=symbols[j], color=colors[i-1],\n",
" s=144, edgecolors='black')\n",
"plt.xlabel('X[~[32]]');\n",
"plt.ylabel('standardized resids');"
Expand Down Expand Up @@ -402,8 +400,9 @@
" plt.scatter(X[idx], S[idx], marker=symbols[j], color=colors[i-1],\n",
" s=144, edgecolors='black')\n",
" # drop NA because there is no idx 32 in the final model\n",
" plt.plot(mf.X[idx].dropna(), lm_final.fittedvalues[idx].dropna(),\n",
" ls=lstyle[j], color=colors[i-1])\n",
" fv = lm_final.fittedvalues.reindex(idx).dropna()\n",
" x = mf.X.reindex(idx).dropna()\n",
" plt.plot(x, fv, ls=lstyle[j], color=colors[i-1])\n",
"plt.xlabel('Experience');\n",
"plt.ylabel('Salary');"
]
Expand Down
32 changes: 16 additions & 16 deletions examples/notebooks/kernel_density.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
"dist2_loc, dist2_scale, weight2 = 1 , .5, .75\n",
"\n",
"# Sample from a mixture of distributions\n",
"obs_dist = mixture_rvs(prob=[weight1, weight2], size=250, \n",
"obs_dist = mixture_rvs(prob=[weight1, weight2], size=250,\n",
" dist=[stats.norm, stats.norm],\n",
" kwargs = (dict(loc=dist1_loc, scale=dist1_scale),\n",
" dict(loc=dist2_loc, scale=dist2_scale)))"
Expand All @@ -94,7 +94,7 @@
"ax = fig.add_subplot(111)\n",
"\n",
"# Scatter plot of data samples and histogram\n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size)), \n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size)),\n",
" zorder=15, color='red', marker='x', alpha=0.5, label='Samples')\n",
"lines = ax.hist(obs_dist, bins=20, edgecolor='k', label='Histogram')\n",
"\n",
Expand Down Expand Up @@ -146,7 +146,7 @@
"ax = fig.add_subplot(111)\n",
"\n",
"# Plot the histrogram\n",
"ax.hist(obs_dist, bins=20, normed=True, label='Histogram from samples', \n",
"ax.hist(obs_dist, bins=20, density=True, label='Histogram from samples',\n",
" zorder=5, edgecolor='k', alpha=0.5)\n",
"\n",
"# Plot the KDE as fitted using the default arguments\n",
Expand All @@ -158,7 +158,7 @@
"ax.plot(kde.support, true_values, lw=3, label='True distribution', zorder=15)\n",
"\n",
"# Plot the samples\n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/40, \n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/40,\n",
" marker='x', color='red', zorder=20, label='Samples', alpha=0.5)\n",
"\n",
"ax.legend(loc='best')\n",
Expand Down Expand Up @@ -197,8 +197,8 @@
"ax = fig.add_subplot(111)\n",
"\n",
"# Plot the histrogram\n",
"ax.hist(obs_dist, bins=25, label='Histogram from samples', \n",
" zorder=5, edgecolor='k', normed=True, alpha=0.5)\n",
"ax.hist(obs_dist, bins=25, label='Histogram from samples',\n",
" zorder=5, edgecolor='k', density=True, alpha=0.5)\n",
"\n",
"# Plot the KDE for various bandwidths\n",
"for bandwidth in [0.1, 0.2, 0.4]:\n",
Expand All @@ -210,7 +210,7 @@
"ax.plot(kde.support, true_values, lw=3, label='True distribution', zorder=15)\n",
"\n",
"# Plot the samples\n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50, \n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50,\n",
" marker='x', color='red', zorder=20, label='Data samples', alpha=0.5)\n",
"\n",
"ax.legend(loc='best')\n",
Expand Down Expand Up @@ -260,10 +260,10 @@
"\n",
"# Enumerate every option for the kernel\n",
"for i, (ker_name, ker_class) in enumerate(kernel_switch.items()):\n",
" \n",
"\n",
" # Initialize the kernel object\n",
" kernel = ker_class()\n",
" \n",
"\n",
" # Sample from the domain\n",
" domain = kernel.domain or [-3, 3]\n",
" x_vals = np.linspace(*domain, num=2**10)\n",
Expand All @@ -276,7 +276,7 @@
" ax.scatter([0], [0], marker='x', color='red')\n",
" plt.grid(True, zorder=-5)\n",
" ax.set_xlim(domain)\n",
" \n",
"\n",
"plt.tight_layout()"
]
},
Expand Down Expand Up @@ -309,20 +309,20 @@
"\n",
"# Enumerate every option for the kernel\n",
"for i, kernel in enumerate(kernel_switch.keys()):\n",
" \n",
"\n",
" # Create a subplot, set the title\n",
" ax = fig.add_subplot(2, 4, i + 1)\n",
" ax.set_title('Kernel function \"{}\"'.format(kernel))\n",
" \n",
"\n",
" # Fit the model (estimate densities)\n",
" kde.fit(kernel=kernel, fft=False, gridsize=2**10)\n",
" \n",
"\n",
" # Create the plot\n",
" ax.plot(kde.support, kde.density, lw=3, label='KDE from samples', zorder=10)\n",
" ax.scatter(data, np.zeros_like(data), marker='x', color='red')\n",
" plt.grid(True, zorder=-5)\n",
" ax.set_xlim([-3, 3])\n",
" \n",
"\n",
"plt.tight_layout()"
]
},
Expand Down Expand Up @@ -363,10 +363,10 @@
"source": [
"fig = plt.figure(figsize=(12, 5))\n",
"ax = fig.add_subplot(111)\n",
"ax.hist(obs_dist, bins=20, normed=True, edgecolor='k', zorder=4, alpha=0.5)\n",
"ax.hist(obs_dist, bins=20, density=True, edgecolor='k', zorder=4, alpha=0.5)\n",
"ax.plot(kde.support, kde.density, lw=3, zorder=7)\n",
"# Plot the samples\n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50, \n",
"ax.scatter(obs_dist, np.abs(np.random.randn(obs_dist.size))/50,\n",
" marker='x', color='red', zorder=20, label='Data samples', alpha=0.5)\n",
"ax.grid(True, zorder=-5)"
]
Expand Down
16 changes: 5 additions & 11 deletions examples/notebooks/mixed_lm_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"%matplotlib inline\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import statsmodels.api as sm\n",
"import statsmodels.formula.api as smf"
]
Expand Down Expand Up @@ -47,7 +48,7 @@
"\n",
"The Statsmodels imputation of linear mixed models (MixedLM) closely follows the approach outlined in Lindstrom and Bates (JASA 1988). This is also the approach followed in the R package LME4. Other packages such as Stata, SAS, etc. should also be consistent with this approach, as the basic techniques in this area are mostly mature.\n",
"\n",
"Here we show how linear mixed models can be fit using the MixedLM procedure in Statsmodels. Results from R (LME4) are included for comparison. \n",
"Here we show how linear mixed models can be fit using the MixedLM procedure in Statsmodels. Results from R (LME4) are included for comparison.\n",
"\n",
"Here are our import statements:"
]
Expand Down Expand Up @@ -86,7 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"%%R \n",
"%%R\n",
"data(dietox, package='geepack')"
]
},
Expand Down Expand Up @@ -161,7 +162,7 @@
"source": [
"md = smf.mixedlm(\"Weight ~ Time\", data, groups=data[\"Pig\"],\n",
" re_formula=\"~Time\")\n",
"free = sm.regression.mixed_linear_model.MixedLMParams.from_components(np.ones(2), \n",
"free = sm.regression.mixed_linear_model.MixedLMParams.from_components(np.ones(2),\n",
" np.eye(2))\n",
"\n",
"mdf = md.fit(free=free)\n",
Expand All @@ -172,7 +173,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The likelihood drops by 0.3 when we fix the correlation parameter to 0. Comparing 2 x 0.3 = 0.6 to the chi^2 1 df reference distribution suggests that the data are very consistent with a model in which this parameter is equal to 0. \n",
"The likelihood drops by 0.3 when we fix the correlation parameter to 0. Comparing 2 x 0.3 = 0.6 to the chi^2 1 df reference distribution suggests that the data are very consistent with a model in which this parameter is equal to 0.\n",
"\n",
"Here is the same model fit using LMER in R (note that here R is reporting the REML criterion instead of the likelihood, where the REML criterion is twice the log likeihood):"
]
Expand Down Expand Up @@ -347,13 +348,6 @@
"plt.xlabel(\"Variance of random slope\", size=17)\n",
"plt.ylabel(\"-2 times profile log likelihood\", size=17)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/regression_diagnostics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This example file shows how to use a few of the ``statsmodels`` regression diagnostic tests in a real-life context. You can learn about more tests and find out more information about the tests here on the [Regression Diagnostics page.](https://www.statsmodels.org/stable/diagnostic.html) \n",
"This example file shows how to use a few of the ``statsmodels`` regression diagnostic tests in a real-life context. You can learn about more tests and find out more information about the tests here on the [Regression Diagnostics page.](https://www.statsmodels.org/stable/diagnostic.html)\n",
"\n",
"Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online ``statsmodels`` documentation. For presentation purposes, we use the ``zip(name,test)`` construct to pretty-print short descriptions in the examples below."
]
Expand Down

0 comments on commit 8233beb

Please sign in to comment.