New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for Third Edition #57

Open
christophergandrud opened this Issue May 30, 2015 · 1 comment

Comments

Projects
None yet
1 participant
@christophergandrud
Owner

christophergandrud commented May 30, 2015

Please list possible changes for the third edition here.

@christophergandrud

This comment has been minimized.

Show comment
Hide comment
@christophergandrud

christophergandrud Sep 16, 2018

Owner

The 3rd edition will include a number of updates:

  • to reflect new R capabilities,

  • address URL "link rot",

  • discuss Jupyter notebooks their use (and abuse),

  • give examples of using Amazon S3 for data hosting

  • provide a new chapter on using Docker images for stronger reproducibility.

It will also reflect a number of experiences that I (and others) have had using these tools in the intervening years since the 2nd edition working and teaching in academics and industry.

It will be roughly 320 pages.

I will convert the current Rnw source to bookdown.

Required R Packages

The book itself should practice what it preaches, i.e. be reproducible. This chapter instructs readers on what R packages (and other ancillary software) to install in order to complete the examples and reproduce the book. There have been a number of improvements to the R echo system that make reproducing the book easier and there are more modern packages that replace the functionality of those included in the 2nd edition.

  • Use pacman for package installation

  • Use tinytex rather than LaTeX

  • Remove ZeligBayesian and Zelig as they are no longer needed to demonstrate the capabilities discussed in Chapter 9.

  • Remove repmis. It no longer supports downloading data from Dropbox due to changes in the Dropbox API. Many of its capabilities for handling data input/output are now better handled by rio. Remove this discussion and the package from the book and largely replace with rio.

  • Give examples from installr, for Windows dependencies such as RTools.

  • Follow system vs. package naming conventions.

  • Not to include in the text, but during writing use clean minimal Docker images to ensure reproducibility from "scratch" across OS'.

Chapter 1

  • Section 1.3: Include more detailed examples of using reproducible research in industry settings based on my recent experiences. It is particularly important for onboarding new team members, avoiding effort duplication (reintroducing previously tested features), and new data governance concerns (e.g. GDPR).

  • Section 1.5.1: Discuss tinytext in addition to full LaTeX install.

  • Section 1.6: Update recommended books to include Xie et al. (2018) R Markdown: The Definitive Guide and Kross (2018) The Unix Workbench.

Chapter 2

  • Replace repmis function references with equivalents from rio.

  • Update Google R Style guide link #76

  • Include styler and lintr package discussions.

Chapter 3

  • Update Figure 3.1 with updated Startup Console

  • Section 3.3.4: Remove discussion of Rtex (this seems to be rarely if ever used in the wild).

  • (new) section on Jupyter notebooks with IRKernel. Also discuss notebooks generally (e.g. experience in machine learning industry and "The First Notebook War")

  • Appendix: install.LyX from installr for installing Lyx. Also check that the discussion is still up-to-date.

Chapter 4

  • New Section 4.4.1: discuss how to use here for more stable file path management across systems.

Chapter 5

  • Section 5.2.2: Remove discussion of Dropbox Public folder as this is no longer supported

  • Section 5.2.2: Update data download link to: https://www.dropbox.com/s/130c5ol3o2jjmgk/public.fin.msm.model.csv?dl=1

  • Section 5.2.2: consider using import from rio, though doesn't has data like source_data in repmis. So, consider keeping repmis discussion. Also include discussion of mirroring external data sources to avoid breaking code if dependency breaks, e.g. due to "link rot".

  • (new) Section 5.5: Storing data on Amazon S3 including comparing use to GitHub (and in version control) including file size restrictions, creating separate files per version and ability to diff versions.

Chapter 6

  • makefile dags for visualising dependencies.

  • Section 6.1.2.3: Discuss using makefiles and LaTeX instead of rmarkdown.

  • Section 6.3: Reiterate avoiding linking to external data sets, provide a mirror when possible.

  • Section 6.3.2: Update URLs that have fallen to link rot.

  • Section 6.3.4: Update list of Data APIs and Feeds.

Chapter 7

  • Section 7.1.1: Add discussion of str.

  • (new) Section 7.1.2: tibble examples

Chapter 8

  • Introduction to "Statistical Modeling and knitr": discuss rules of thumb that researchers could apply to determining if code should be in the knitr text of a source compiled with knitr.

  • (new) Section: discussion and examples of tidymodels.

Chapter 9

  • Highlight that a real benefit of knitr for longer documents is typically in producing tables and figures. Execution of data collection/cleaning/analysis often makes more sense with makefiles in these contexts.

  • Remove ZeligBayesian as it is no longer needed by Zelig. Also remove example in 9.3.5 with the package. Changes in Zelig have broken the example. Consider replacing with example using brms.

## Chapter 10

  • Section 10.3: Update data URLs as before.

  • Section 10.4: Update caterpillar plot for new posterior densities from brms.

  • Section 10.5: Add networkdD3 example.

Chapter 11

  • Section 11.1.3: Provide rmarkdown version of the LaTeX example.

  • knitr engines tikz and d3 (could add latter as example in Chapter 13.

    • is_html_output and is_pdf_output knitr options for switching between formats.

Chapter 12

  • Updated rotten URLs throughout the examples (use local mirroring principle).

  • Introduction and discussion of bookdown

Chapter 13

  • Section 13.3: Add discussion of xaringan for slide shows.

  • Section 13.4.2: Remove discussion of discontinued Dropbox Public folder hosting.

(new) Chapter 14

Chapter 15 (new conclusion)

  • Remove Docker discussion
Owner

christophergandrud commented Sep 16, 2018

The 3rd edition will include a number of updates:

  • to reflect new R capabilities,

  • address URL "link rot",

  • discuss Jupyter notebooks their use (and abuse),

  • give examples of using Amazon S3 for data hosting

  • provide a new chapter on using Docker images for stronger reproducibility.

It will also reflect a number of experiences that I (and others) have had using these tools in the intervening years since the 2nd edition working and teaching in academics and industry.

It will be roughly 320 pages.

I will convert the current Rnw source to bookdown.

Required R Packages

The book itself should practice what it preaches, i.e. be reproducible. This chapter instructs readers on what R packages (and other ancillary software) to install in order to complete the examples and reproduce the book. There have been a number of improvements to the R echo system that make reproducing the book easier and there are more modern packages that replace the functionality of those included in the 2nd edition.

  • Use pacman for package installation

  • Use tinytex rather than LaTeX

  • Remove ZeligBayesian and Zelig as they are no longer needed to demonstrate the capabilities discussed in Chapter 9.

  • Remove repmis. It no longer supports downloading data from Dropbox due to changes in the Dropbox API. Many of its capabilities for handling data input/output are now better handled by rio. Remove this discussion and the package from the book and largely replace with rio.

  • Give examples from installr, for Windows dependencies such as RTools.

  • Follow system vs. package naming conventions.

  • Not to include in the text, but during writing use clean minimal Docker images to ensure reproducibility from "scratch" across OS'.

Chapter 1

  • Section 1.3: Include more detailed examples of using reproducible research in industry settings based on my recent experiences. It is particularly important for onboarding new team members, avoiding effort duplication (reintroducing previously tested features), and new data governance concerns (e.g. GDPR).

  • Section 1.5.1: Discuss tinytext in addition to full LaTeX install.

  • Section 1.6: Update recommended books to include Xie et al. (2018) R Markdown: The Definitive Guide and Kross (2018) The Unix Workbench.

Chapter 2

  • Replace repmis function references with equivalents from rio.

  • Update Google R Style guide link #76

  • Include styler and lintr package discussions.

Chapter 3

  • Update Figure 3.1 with updated Startup Console

  • Section 3.3.4: Remove discussion of Rtex (this seems to be rarely if ever used in the wild).

  • (new) section on Jupyter notebooks with IRKernel. Also discuss notebooks generally (e.g. experience in machine learning industry and "The First Notebook War")

  • Appendix: install.LyX from installr for installing Lyx. Also check that the discussion is still up-to-date.

Chapter 4

  • New Section 4.4.1: discuss how to use here for more stable file path management across systems.

Chapter 5

  • Section 5.2.2: Remove discussion of Dropbox Public folder as this is no longer supported

  • Section 5.2.2: Update data download link to: https://www.dropbox.com/s/130c5ol3o2jjmgk/public.fin.msm.model.csv?dl=1

  • Section 5.2.2: consider using import from rio, though doesn't has data like source_data in repmis. So, consider keeping repmis discussion. Also include discussion of mirroring external data sources to avoid breaking code if dependency breaks, e.g. due to "link rot".

  • (new) Section 5.5: Storing data on Amazon S3 including comparing use to GitHub (and in version control) including file size restrictions, creating separate files per version and ability to diff versions.

Chapter 6

  • makefile dags for visualising dependencies.

  • Section 6.1.2.3: Discuss using makefiles and LaTeX instead of rmarkdown.

  • Section 6.3: Reiterate avoiding linking to external data sets, provide a mirror when possible.

  • Section 6.3.2: Update URLs that have fallen to link rot.

  • Section 6.3.4: Update list of Data APIs and Feeds.

Chapter 7

  • Section 7.1.1: Add discussion of str.

  • (new) Section 7.1.2: tibble examples

Chapter 8

  • Introduction to "Statistical Modeling and knitr": discuss rules of thumb that researchers could apply to determining if code should be in the knitr text of a source compiled with knitr.

  • (new) Section: discussion and examples of tidymodels.

Chapter 9

  • Highlight that a real benefit of knitr for longer documents is typically in producing tables and figures. Execution of data collection/cleaning/analysis often makes more sense with makefiles in these contexts.

  • Remove ZeligBayesian as it is no longer needed by Zelig. Also remove example in 9.3.5 with the package. Changes in Zelig have broken the example. Consider replacing with example using brms.

## Chapter 10

  • Section 10.3: Update data URLs as before.

  • Section 10.4: Update caterpillar plot for new posterior densities from brms.

  • Section 10.5: Add networkdD3 example.

Chapter 11

  • Section 11.1.3: Provide rmarkdown version of the LaTeX example.

  • knitr engines tikz and d3 (could add latter as example in Chapter 13.

    • is_html_output and is_pdf_output knitr options for switching between formats.

Chapter 12

  • Updated rotten URLs throughout the examples (use local mirroring principle).

  • Introduction and discussion of bookdown

Chapter 13

  • Section 13.3: Add discussion of xaringan for slide shows.

  • Section 13.4.2: Remove discussion of discontinued Dropbox Public folder hosting.

(new) Chapter 14

Chapter 15 (new conclusion)

  • Remove Docker discussion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment