Skip to content
This repository has been archived by the owner on Sep 18, 2019. It is now read-only.

Commit

Permalink
Render hw09_automation.html
Browse files Browse the repository at this point in the history
  • Loading branch information
sjackman committed Nov 9, 2016
1 parent 0f6affd commit 272fbc1
Showing 1 changed file with 91 additions and 30 deletions.
121 changes: 91 additions & 30 deletions hw09_automation.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@




<title>Automating Data-analysis Pipelines</title>

<script src="libs/jquery-1.11.0/jquery.min.js"></script>
<script src="libs/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="libs/bootstrap-3.3.1/css/bootstrap.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.1/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.1/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.1/shim/respond.min.js"></script>
<link href="libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
Expand Down Expand Up @@ -48,6 +49,34 @@
</script>



<style type="text/css">
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}
</style>

<link rel="stylesheet" href="libs/local/main.css" type="text/css" />
<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" type="text/css" />
Expand All @@ -66,13 +95,35 @@
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
button.code-folding-btn:focus {
outline: none;
}
</style>


<div class="container-fluid main-container">

<!-- tabsets -->
<script src="libs/navigation-1.1/tabsets.js"></script>
<script>
$(document).ready(function () {
window.buildTabsets("TOC");
});
</script>

<!-- code folding -->





<header>
<div class="nav">
<a class="nav-logo" href="index.html">
Expand All @@ -88,8 +139,12 @@
</div>
</header>

<div id="header">
<h1 class="title">Automating Data-analysis Pipelines</h1>
<div class="fluid-row" id="header">



<h1 class="title toc-ignore">Automating Data-analysis Pipelines</h1>

</div>

<div id="TOC">
Expand All @@ -109,7 +164,7 @@ <h1 class="title">Automating Data-analysis Pipelines</h1>
</ul>
</div>

<p>Due Friday November 27.</p>
<p>Due Friday 2016-November-18.</p>
<div id="big-picture" class="section level1">
<h1>Big picture</h1>
<ul>
Expand Down Expand Up @@ -141,12 +196,10 @@ <h1>Please just tell me what to do</h1>
<h2>Download the data</h2>
<p>Download the raw data for our example, <a href="https://github.com/jennybc/gapminder/blob/master/inst/gapminder.tsv">gapminder.tsv</a>.</p>
<ul>
<li><p>Option 1: via an R script using <a href="http://cran.r-project.org/web/packages/downloader/index.html">downloader::download</a> or <a href="http://www.omegahat.org/RCurl/">RCurl::getURL</a>. note: <a href="http://stat.ethz.ch/R-manual/R-patched/library/utils/html/download.file.html">download.file</a> does not work with <code>https://</code></p>
<pre class="r"><code>downloader::download(&quot;https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv&quot;)
cat(file = &quot;gapminder.tsv&quot;,
RCurl::getURL(&quot;https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv&quot;))</code></pre></li>
<li><p>Option 1: via an R script using <a href="http://stat.ethz.ch/R-manual/R-patched/library/utils/html/download.file.html">download.file</a></p>
<pre class="r"><code>download.file(&quot;https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv&quot;, destfile=&quot;gapminder.tsv&quot;)</code></pre></li>
<li><p>Option 2: in a <a href="git09_shell.html">shell</a> script using <code>curl</code> or <code>wget</code>.</p>
<pre class="bash"><code>curl -O https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv
<pre class="bash"><code>curl -o gapminder.tsv https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv
wget https://raw.githubusercontent.com/jennybc/gapminder/master/inst/gapminder.tsv</code></pre></li>
</ul>
</div>
Expand All @@ -165,20 +218,31 @@ <h2>Perform statistical analyses</h2>
<ul>
<li>Import the data created in the first script.</li>
<li>Make sure your new continent order is still in force. You decide the details.</li>
<li>Fit a linear regression of life expectancy on year within each country. Write the estimated intercepts, slopes, and residual error variance (or sd) to file.</li>
<li>Fit a linear regression of life expectancy on year within each country. Write the estimated intercepts, slopes, and residual error variance (or sd) to file. The R package <code>broom</code> may be useful here.</li>
<li>Find the 3 or 4 “worst” and “best” countries for each continent. You decide the details.</li>
<li>Write the linear regression info for just these countries to file.</li>
</ul>
</div>
<div id="generate-figures" class="section level2">
<h2>Generate figures</h2>
<p>Create a figure for each continent, including data only for the 6-8 “extreme” countries, and write to file. One file per continent, with an informative name. The figure should give scatterplots of life expectancy vs. year, facetting on country, fitted line overlaid.</p>
<p>Create a figure for each continent, and write one file per continent, with an informative name. The figure should give scatterplots of life expectancy vs. year, faceting on country, fitted line overlaid.</p>
</div>
<div id="automate-the-pipeline" class="section level2">
<h2>Automate the pipeline</h2>
<p>Identify and test a method of running your pipeline non-interactively.</p>
<p>You could write a master R script that simply <code>source()</code>s the three scripts, one after the other. Tip: you will probably want a second “clean up / reset” script that deletes all the output your scripts leave behind, so you can easily test and refine your strategy, i.e. without repeatedly deleting stuff “by hand”. You can run the master script or the cleaning script from a <a href="git09_shell.html">shell</a> with <code>R CMD BATCH</code> or <code>Rscript</code>.</p>
<p>Provide a link to a page that explains how your pipeline works and links to the remaining files. Your peers and the TAs should be able to go to this landing page and re-run your analysis quickly and easily. Consider including an image showing a graphical view of your pipeline.</p>
<p>Write a master R script that simply <code>source()</code>s the three scripts, one after the other. Tip: you may want a second “clean up / reset” script that deletes all the output your scripts leave behind, so you can easily test and refine your strategy, i.e. without repeatedly deleting stuff “by hand”. You can run the master script or the cleaning script from a <a href="git09_shell.html">shell</a> with <code>Rscript</code>.</p>
<p>Render your RMarkdown report generating Markdown and HTML using <code>rmarkdown::render</code>.</p>
<ul>
<li>To render an RMarkdown report and emulate RStudio’s “Knit HTML” button, use <code>rmarkdown::render('myAwesomeReport.rmd')</code></li>
<li>To render an R script and emulate RStudio’s “Compile Notebook” button, use <code>rmarkdown::render('myAwesomeScript.R')</code></li>
</ul>
<p>Write a <code>Makefile</code> to automate your pipeline using <code>make</code>. See the <a href="#links">Links</a> section below for help. Also demonstrated in the example <a href="https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/02_automation-example_r-and-make">02_rAndMake</a> and in the example <a href="https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/03_automation-example_render-without-rstudio">03_knitWithoutRStudio</a></p>
<ul>
<li>To run an R script use <code>Rscript myAwesomeScript.R</code></li>
<li>To render an RMarkdown report, use <code>Rscript -e &quot;rmarkdown::render('myAwesomeReport.rmd')&quot;</code></li>
<li>To render an R script, use <code>Rscript -e &quot;rmarkdown::render('myAwesomeScript.R')&quot;</code></li>
<li>See the Makefile in <a href="https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/03_automation-example_render-without-rstudio">03_knitWithoutRStudio</a> to see these commands in action</li>
</ul>
<p>Provide a link to a <code>README.md</code> page that explains how your pipeline works and links to the remaining files. Your peers and the TAs should be able to go to this landing page and re-run your analysis quickly and easily.</p>
<p>Consider including an image showing a graphical view (the dependency diagram) of your pipeline using <a href="https://github.com/lindenb/makefile2graph">makefile2graph</a>. On Mac or Linux you can install <code>makefile2graph</code> using <a href="http://brew.sh">Homebrew</a> or <a href="http://linuxbrew.sh">Linuxbrew</a> with the command <code>brew install makefile2graph</code>.</p>
</div>
</div>
<div id="i-want-to-aim-higher" class="section level1">
Expand All @@ -192,21 +256,16 @@ <h1>I want to aim higher!</h1>
<li>Are there dates and times that need special handling? Do it!</li>
<li>Are there annoying observations that require very special handling or crap up your figures (e.g. Oceania)? Drop them!</li>
</ul>
<p>Include some dynamic report generation in your pipeline. That is, create HTML from one or more plain R or R markdown files.</p>
<ul>
<li>Example of how to emulate RStudio’s “Compile Notebook” button from a <a href="git09_shell.html">shell</a>: <code>Rscript -e &quot;rmarkdown::render('myAwesomeScript.R')&quot;</code> or using <code>knitr</code> instead of <code>rmarkdown</code> <code>Rscript -e &quot;knitr::stitch_rmd('myAwesomeScript.R')&quot;</code></li>
<li>To emulate “Knit HTML”, use <code>rmarkdown::render()</code> or knitr’s <code>knitr::knit2html()</code>.</li>
<li>See the Makefile in <a href="https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/03_automation-example_render-without-rstudio">03_knitWithoutRStudio</a> to see these commands in action</li>
</ul>
<p>Experiment with running R code saved in a script from within R Markdown. Here’s some official documentation on <a href="http://yihui.name/knitr/demo/externalization/">code externalization</a>.</p>
<p>Embed pre-existing figures in and R Markdown document, i.e. an R script creates the figures, then the report incorporates them. General advice on writing figures to file is <a href="block017_write-figure-to-file.html">here</a>. See an example of this in <a href="https://github.com/jennybc/STAT545A_2013/blob/master/hw06_scaffolds/03_knitWithoutRStudio/03_doStuff.Rmd">an R Markdown file in one of the examples</a>.</p>
<p>Embed pre-existing figures in an R Markdown document, i.e. an R script creates the figures, then the report incorporates them. General advice on writing figures to file is <a href="block017_write-figure-to-file.html">here</a>. See an example of this in <a href="https://github.com/jennybc/STAT545A_2013/blob/master/hw06_scaffolds/03_knitWithoutRStudio/03_doStuff.Rmd">an R Markdown file in one of the examples</a>.</p>
<p>Import pre-existing data in an R Markdown document, then format nicely as a table.</p>
<p>Use Pandoc and/or LaTeX to explore new territory in document compilation. You could use Pandoc as an alternative to <code>rmarkdown</code> (or <code>knitr</code>) for Markdown to HTML conversion; you’d still use <code>rmarkdown</code> for conversion of R Markdown to Markdown. You would use LaTeX to get PDF output from Markdown.</p>
<p>Use <code>Make</code> to run your pipeline. See below for help. Also demonstrated in the example <a href="https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/02_automation-example_r-and-make">02_rAndMake</a> and in the example <a href="https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/03_automation-example_render-without-rstudio">03_knitWithoutRStudio</a></p>
</div>
<div id="links" class="section level1">
<h1>Links</h1>
<ul>
<li><a href="https://github.com/sjackman/makefile-example/">An example of a data analysis pipeline using Make</a> by <a href="http://sjackman.ca">Shaun Jackman</a></li>
<li><a href="http://sjackman.ca/makefile-slides/">Automating Data Analysis Pipelines</a> slides by <a href="http://sjackman.ca">Shaun Jackman</a></li>
<li><a href="http://kbroman.github.io/minimal_make/">An introduction to <code>Make</code></a> by Karl Broman aimed at stats / data science types</li>
<li>Blog post <a href="http://www.bendmorris.com/2013/09/using-make-for-reproducible-scientific.html">Using Make for reproducible scientific analyses</a> by Ben Morris</li>
<li><a href="http://software-carpentry.org/v4/make/index.html">Slides on <code>Make</code></a> from Software Carpentry</li>
Expand All @@ -223,6 +282,8 @@ <h1>Authors</h1>
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>



</div>

<script>
Expand Down

0 comments on commit 272fbc1

Please sign in to comment.