Feature suggestion: vector with dependent variable names for pander.lm() and such. #46

Closed
kovla opened this Issue Jun 3, 2013 · 8 comments

3 participants

@kovla

A very nice feature of pander is the ability to render models (lm, etc) in Markdown syntax. This allows a very seamless integration of model presentation, as MD tables are easily converted both to PDF and docx with pandoc. The stargazer package (which can be seen as a similar tool for the purpose of model rendering) produces LaTeX code, and thus works (automatically) only for conversion to PDF. So far so good.

One of the features from stargazer I really miss in pandoc is the ability to specify variable names in the model output. Pander simply takes the original variable names, and this approach fails when variables are factors. R assigns them names like "gendermale" or "educationlowe". These need to be edited by hand before being put in a publishable text.

It is not difficult by all means, just edit the tables manually before the publication, and that's it. However it would be better if these reader-friendly names could be specified programmatically.

Well, that and thanks for pander, it is really a pleasure working with it.

@daroczig
Rapporter member

Sorry, I am not really familiar with stargazer - despite the fact that I've decided at least 10 times that I should really get into the details to see what nice features it can offer.

As far as I see now you are missing the dep.var.labels argument of stargazer from pander.lm, right? IMHO that can be implemented in a few minutes so I would be more than happy to do so.

But could you please also suggest what other useful features stargazer offer that pander is missing? Unfortunately it's not easy (impossible) to create such neat tables like with stargazer as Pandoc does not support row/col spanning, but I am pretty sure stargazer offers a lot more flexibility too (like e.g. adding stars for statistical significance).

@kovla

Hmm, that's an interesting question. Well, where stargazer goes a bit amiss to my taste, is that it presents models vertically, model estimates in a single column, significance marked with stars. It is a great feature if you want to present multiple models in the same table, but if you want to give just a single model, it is best to supply all the stuff that R gives in summary(). So pander handles solo models well, but if it could present multiple models in a single table, that would make conversion to .docx so much easier. Like I said, stargazer tables have to be converted e.g. using LaTeX2rtf and then copied to Word, while pandoc does a pretty good job of converting markdown tables do Word automatically. Perhaps it is even better that there is no column spanning, since that was exactly the issue in the conversion process.

Second, stargazer presents some additional information about the model: R-square, R-square adjusted, number of observations in the model, residual std. error, F-statistic, df. This looks pretty cool, and is often useful (since by default R removes incomplete observations, and you can unknowingly include a variable with many missings). This output can be suppressed, and this is also a nice to have feature. This info is printed in a small font and does not clutter the table (an important thing). Maybe not all of this info for pander, but R-square adjusted and N are a good idea.

Third, stargazer can include a note beneath the model, in small print e.g. about what the stars signify. Again, not sure if this can be done with Markdown. Include an additional cell with a superscript note maybe? The note can say anything, it can be a remark about the model etc.

Oh, this is a good one, and probably easy to implement. Stargazer can suppress the output on certain variables. For instance, recently I had a model with a nominal variable with 16 categories (sector), of which only one was significant, and I was not interested in sector per se, this was just a control. So I suppressed the output of that variable, which gave me a much cleaner output. I wrote in the note that the model includes sector, but it is not rendered in the table.

Otherwise can't think of much else really.

@daroczig
Rapporter member

Thank you very much for the great ideas, I do really appreciate that!

I did something in the above commit about your first suggestion:

> pander(lm(Sepal.Width ~ Species, data = iris), covariate.labels = c('Versicolor', 'Virginica'))
--------------------------------------------------------------
              Estimate   Std. Error   t value   Pr(>|t|) 
----------------- ---------- ------------ --------- ----------
 **Versicolor**     -0.658     0.06794     -9.685   1.832e-17 

  **Virginica**     -0.454     0.06794     -6.683   4.539e-10 

 **(Intercept)**    3.428      0.04804      71.36   5.708e-116
--------------------------------------------------------------

Table: Fitting linear model: Sepal.Width ~ Species

But will definitely tweak these options further in the next few days. Based on this new option, the "Intercept" was moved to the end of the table (just like with stargazer).

@kovla

Perfect, this representation is actually publication quality now I believe. Some journals might be picky and require more, but for general purpose writing it is a "ready-to-wear" solution for sure. Reports and such. Thank you very much, this is of great practical use.

@daroczig
Rapporter member

Thank you very much again for the feedback. I have just pushed some further updates about a basic solution to show e.g. the R-squared (that I could not merge to the current table but had to print it into a separate table - which does not look very bad on a HTML/odt/docx output) and also an option to suppress a some rows based on passed regular expression (with grep). Quick demo of these new features:

> library(pander)
> pander(lm(mpg ~ hp + wt, data = mtcars), summary = TRUE)

--------------------------------------------------------------
              Estimate   Std. Error   t value   Pr(>|t|) 
----------------- ---------- ------------ --------- ----------
     **hp**        -0.03177    0.00903     -3.519    0.001451 

     **wt**         -3.878      0.6327     -6.129    1.12e-06 

 **(Intercept)**    37.23       1.599       23.28   2.565e-20 
--------------------------------------------------------------


-------------------------------------------------------------
 Observations   Residual Std. Error   $R^2$   Adjusted $R^2$ 
-------------- --------------------- ------- ----------------
      32               2.593         0.8268       0.8148     
-------------------------------------------------------------

Table: Fitting linear model: mpg ~ hp + wt

> panderOptions('table.split.table', Inf)
> fit <- lm(Sepal.Width ~ Species + Sepal.Length, data = iris)
> pander(fit)

--------------------------------------------------------------------
        &nbsp;           Estimate   Std. Error   t value   Pr(>|t|) 
----------------------- ---------- ------------ --------- ----------
 **Speciesversicolor**   -0.9834     0.07207     -13.64    7.62e-28 

 **Speciesvirginica**     -1.008     0.09331      -10.8   2.407e-20 

   **Sepal.Length**       0.3499      0.0463      7.557   4.187e-12 

    **(Intercept)**       1.677       0.2354      7.123   4.456e-11 
--------------------------------------------------------------------

Table: Fitting linear model: Sepal.Width ~ Species + Sepal.Length

> pander(fit, omit = 'Species')

---------------------------------------------------------------
      &nbsp;        Estimate   Std. Error   t value   Pr(>|t|) 
------------------ ---------- ------------ --------- ----------
 **Sepal.Length**    0.3499      0.0463      7.557   4.187e-12 

 **(Intercept)**     1.677       0.2354      7.123   4.456e-11 
---------------------------------------------------------------

Table: Fitting linear model: Sepal.Width ~ Species + Sepal.Length

And of course I will keep thinking about these issues, as tons of neat things still have to be finished (and applied to other models).

@sebastianbarfort

I have only just discovered Pander, but it seems very promising. I prefer to write all my academic documents in markdown for easy conversion to .tex, .html and .docx using Pandoc. I also use Stargazer a lot to produce Latex tables, and I am having difficulties creating the same features using pander.

Two things that come to mind and would be great additions to the package would be letting standard errors be set below the point estimate, see for example

library(stargazer)
linear.1 <- lm(rating ~ complaints + privileges + learning + raises + critical, data=attitude)
linear.2 <- lm(rating ~ complaints + privileges + learning, data=attitude)

stargazer(linear.1, linear.2, type="text", title="Regression Results", single.row=FALSE)

and allowing for customized standard errors (see for example section 2.2 here)

Overall, I think the optimal strategy would be to make stargazer export .md format as an option, but I don't know whether this is something Marek is working on.

Thanks for a great package!

@daroczig
Rapporter member

Thank you @sebastianbarfort, the idea is indeed inspiring. I used to raise objection for such ideas thanks to the limitations of Pandoc's markdown (as there is no support for col/row spans), but I tend to think about possible workarounds as it seems that column and row spanning will never be supported by Pandoc.

As far as I see now, a fair trade-off would be implementing the output with line-breaks in multi-line markdown tables, similar to what I suggested for CrossTable class at GSoC 2014. Hopefully, a skillful student will implement this in the near future, otherwise I will get some spare time this summer to implement a stargazer like markdown table for multiple models. Thanks again!

@daroczig
Rapporter member

Probably this is resolved with #80. But please verify, and I would of course love to hear your feedback.

@daroczig daroczig closed this Jun 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment