putting-it-all-together-writing-a-package-to-work-on-data.html

<!DOCTYPE html>
<html >

<head>

  <meta charset="UTF-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <title>Functional programming and unit testing for data munging with R</title>
  <meta name="description" content="This book is an introduction to functional programming and unit testing with the R programming language, for the purpose of data muning">
  <meta name="generator" content="bookdown 0.5 and GitBook 2.6.7">

  <meta property="og:title" content="Functional programming and unit testing for data munging with R" />
  <meta property="og:type" content="book" />
  
  
  <meta property="og:description" content="This book is an introduction to functional programming and unit testing with the R programming language, for the purpose of data muning" />
  <meta name="github-repo" content="b-rodrigues/fput" />

  <meta name="twitter:card" content="summary" />
  <meta name="twitter:title" content="Functional programming and unit testing for data munging with R" />
  
  <meta name="twitter:description" content="This book is an introduction to functional programming and unit testing with the R programming language, for the purpose of data muning" />
  

<meta name="author" content="Bruno Rodrigues">


<meta name="date" content="2017-12-28">

  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="black">
  
  
<link rel="prev" href="unit-testing.html">
<link rel="next" href="references.html">
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />


<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
  margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>

<link rel="stylesheet" href="style.css" type="text/css" />
</head>

<body>


  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">

    <div class="book-summary">
      <nav role="navigation">

<ul class="summary">
<li><a href="./">Functional programming and unit testing for data munging</a></li>

<li class="divider"></li>
<li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Why this book?</a><ul>
<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#important-notice"><i class="fa fa-check"></i><b>1.1</b> Important notice</a></li>
<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#motivation"><i class="fa fa-check"></i><b>1.2</b> Motivation</a></li>
<li class="chapter" data-level="1.3" data-path="index.html"><a href="index.html#who-am-i"><i class="fa fa-check"></i><b>1.3</b> Who am I?</a></li>
<li class="chapter" data-level="1.4" data-path="index.html"><a href="index.html#thanks"><i class="fa fa-check"></i><b>1.4</b> Thanks</a></li>
<li class="chapter" data-level="1.5" data-path="index.html"><a href="index.html#license"><i class="fa fa-check"></i><b>1.5</b> License</a></li>
</ul></li>
<li class="chapter" data-level="2" data-path="intro.html"><a href="intro.html"><i class="fa fa-check"></i><b>2</b> Introduction</a><ul>
<li class="chapter" data-level="2.1" data-path="intro.html"><a href="intro.html#get_r"><i class="fa fa-check"></i><b>2.1</b> Getting R</a></li>
<li class="chapter" data-level="2.2" data-path="intro.html"><a href="intro.html#fprog_overview"><i class="fa fa-check"></i><b>2.2</b> A short overview of functional programming</a></li>
<li class="chapter" data-level="2.3" data-path="intro.html"><a href="intro.html#unit_overview"><i class="fa fa-check"></i><b>2.3</b> A short overview of unit testing</a></li>
<li class="chapter" data-level="2.4" data-path="intro.html"><a href="intro.html#general-recommendations-to-follow-this-book"><i class="fa fa-check"></i><b>2.4</b> General recommendations to follow this book</a></li>
</ul></li>
<li class="chapter" data-level="3" data-path="fprog.html"><a href="fprog.html"><i class="fa fa-check"></i><b>3</b> Functional Programming</a><ul>
<li class="chapter" data-level="3.1" data-path="fprog.html"><a href="fprog.html#fprog_intro"><i class="fa fa-check"></i><b>3.1</b> Introduction</a><ul>
<li class="chapter" data-level="3.1.1" data-path="fprog.html"><a href="fprog.html#function-definitions"><i class="fa fa-check"></i><b>3.1.1</b> Function definitions</a></li>
<li class="chapter" data-level="3.1.2" data-path="fprog.html"><a href="fprog.html#properties-of-functions"><i class="fa fa-check"></i><b>3.1.2</b> Properties of functions</a></li>
</ul></li>
<li class="chapter" data-level="3.2" data-path="fprog.html"><a href="fprog.html#mapping-and-reducing-the-base-way"><i class="fa fa-check"></i><b>3.2</b> Mapping and Reducing: the <em>base</em> way</a><ul>
<li class="chapter" data-level="3.2.1" data-path="fprog.html"><a href="fprog.html#mapping-with-map-and-the-apply-family-of-functions"><i class="fa fa-check"></i><b>3.2.1</b> Mapping with <code>Map()</code> and the <code>*apply()</code> family of functions</a></li>
<li class="chapter" data-level="3.2.2" data-path="fprog.html"><a href="fprog.html#reduce"><i class="fa fa-check"></i><b>3.2.2</b> <code>Reduce()</code></a></li>
</ul></li>
<li class="chapter" data-level="3.3" data-path="fprog.html"><a href="fprog.html#map_reduce_purrr"><i class="fa fa-check"></i><b>3.3</b> Mapping and Reducing: the <code>purrr</code> way</a><ul>
<li class="chapter" data-level="3.3.1" data-path="fprog.html"><a href="fprog.html#the-map-family-of-functions"><i class="fa fa-check"></i><b>3.3.1</b> The <code>map*()</code> family of functions</a></li>
<li class="chapter" data-level="3.3.2" data-path="fprog.html"><a href="fprog.html#reducing-with-purrr"><i class="fa fa-check"></i><b>3.3.2</b> Reducing with <code>purrr</code></a></li>
</ul></li>
<li class="chapter" data-level="3.4" data-path="fprog.html"><a href="fprog.html#basic-anonymous-functions"><i class="fa fa-check"></i><b>3.4</b> Basic anonymous functions</a></li>
<li class="chapter" data-level="3.5" data-path="fprog.html"><a href="fprog.html#wrap-up"><i class="fa fa-check"></i><b>3.5</b> Wrap-up</a></li>
<li class="chapter" data-level="3.6" data-path="fprog.html"><a href="fprog.html#exercises"><i class="fa fa-check"></i><b>3.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="4" data-path="tidyverse.html"><a href="tidyverse.html"><i class="fa fa-check"></i><b>4</b> The <code>tidyverse</code></a><ul>
<li class="chapter" data-level="4.1" data-path="tidyverse.html"><a href="tidyverse.html#smoking-is-bad-for-you-but-pipes-are-your-friend"><i class="fa fa-check"></i><b>4.1</b> Smoking is bad for you, but pipes are your friend</a></li>
<li class="chapter" data-level="4.2" data-path="tidyverse.html"><a href="tidyverse.html#getting-data-into-r-with-readr-readxl-haven-and-what-are-tibbles"><i class="fa fa-check"></i><b>4.2</b> Getting data into R with <code>readr</code>, <code>readxl</code>, <code>haven</code> and what are <em>tibbles</em></a><ul>
<li class="chapter" data-level="4.2.1" data-path="tidyverse.html"><a href="tidyverse.html#the-swiss-army-knife-of-data-import-and-export-rio"><i class="fa fa-check"></i><b>4.2.1</b> The swiss army knife of data import and export: <code>rio</code></a></li>
</ul></li>
<li class="chapter" data-level="4.3" data-path="tidyverse.html"><a href="tidyverse.html#writing-any-object-to-disk"><i class="fa fa-check"></i><b>4.3</b> Writing any object to disk</a></li>
<li class="chapter" data-level="4.4" data-path="tidyverse.html"><a href="tidyverse.html#using-rstudio-projects-to-manage-paths"><i class="fa fa-check"></i><b>4.4</b> Using RStudio projects to manage paths</a></li>
<li class="chapter" data-level="4.5" data-path="tidyverse.html"><a href="tidyverse.html#transforming-your-data-with-dplyr"><i class="fa fa-check"></i><b>4.5</b> Transforming your data with <code>dplyr</code></a><ul>
<li class="chapter" data-level="4.5.1" data-path="tidyverse.html"><a href="tidyverse.html#filter-and-friends"><i class="fa fa-check"></i><b>4.5.1</b> <code>filter()</code> and friends</a></li>
<li class="chapter" data-level="4.5.2" data-path="tidyverse.html"><a href="tidyverse.html#select-and-its-helpers"><i class="fa fa-check"></i><b>4.5.2</b> <code>select()</code> and its helpers</a></li>
<li class="chapter" data-level="4.5.3" data-path="tidyverse.html"><a href="tidyverse.html#group_by"><i class="fa fa-check"></i><b>4.5.3</b> <code>group_by()</code></a></li>
<li class="chapter" data-level="4.5.4" data-path="tidyverse.html"><a href="tidyverse.html#summarise"><i class="fa fa-check"></i><b>4.5.4</b> <code>summarise()</code></a></li>
<li class="chapter" data-level="4.5.5" data-path="tidyverse.html"><a href="tidyverse.html#mutate-and-transmute"><i class="fa fa-check"></i><b>4.5.5</b> <code>mutate()</code> and <code>transmute()</code></a></li>
<li class="chapter" data-level="4.5.6" data-path="tidyverse.html"><a href="tidyverse.html#tally-and-count"><i class="fa fa-check"></i><b>4.5.6</b> <code>tally()</code> and <code>count()</code></a></li>
<li class="chapter" data-level="4.5.7" data-path="tidyverse.html"><a href="tidyverse.html#joining-tibbles-with-full_join-left_join-right_join-and-all-the-others"><i class="fa fa-check"></i><b>4.5.7</b> Joining <code>tibble</code>s with <code>full_join()</code>, <code>left_join()</code>, <code>right_join()</code> and all the others</a></li>
</ul></li>
<li class="chapter" data-level="4.6" data-path="tidyverse.html"><a href="tidyverse.html#tidy-your-data-with-tidyr"><i class="fa fa-check"></i><b>4.6</b> Tidy your data with <code>tidyr</code></a></li>
<li class="chapter" data-level="4.7" data-path="tidyverse.html"><a href="tidyverse.html#functional-programming-with-purrr-and-purrrlyr"><i class="fa fa-check"></i><b>4.7</b> Functional programming with <code>purrr</code> and <code>purrrlyr</code></a><ul>
<li class="chapter" data-level="4.7.1" data-path="tidyverse.html"><a href="tidyverse.html#mapping-and-reducing-with-purrr-continued"><i class="fa fa-check"></i><b>4.7.1</b> Mapping and reducing with <code>purrr</code>, continued</a></li>
<li class="chapter" data-level="4.7.2" data-path="tidyverse.html"><a href="tidyverse.html#safely-and-possibly"><i class="fa fa-check"></i><b>4.7.2</b> <code>safely()</code> and <code>possibly()</code></a></li>
<li class="chapter" data-level="4.7.3" data-path="tidyverse.html"><a href="tidyverse.html#transposing-lists"><i class="fa fa-check"></i><b>4.7.3</b> «Transposing lists»</a></li>
</ul></li>
<li class="chapter" data-level="4.8" data-path="tidyverse.html"><a href="tidyverse.html#special-packages-for-special-kinds-of-data-forcats-lubridate-and-stringr"><i class="fa fa-check"></i><b>4.8</b> Special packages for special kinds of data: <code>forcats</code>, <code>lubridate</code>, and <code>stringr</code></a><ul>
<li class="chapter" data-level="4.8.1" data-path="tidyverse.html"><a href="tidyverse.html#section"><i class="fa fa-check"></i><b>4.8.1</b> 🐈🐈🐈🐈</a></li>
</ul></li>
<li class="chapter" data-level="4.9" data-path="tidyverse.html"><a href="tidyverse.html#exercises-1"><i class="fa fa-check"></i><b>4.9</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="5" data-path="prog-tidyverse.html"><a href="prog-tidyverse.html"><i class="fa fa-check"></i><b>5</b> Programming with the <code>tidyverse</code></a></li>
<li class="chapter" data-level="6" data-path="packages.html"><a href="packages.html"><i class="fa fa-check"></i><b>6</b> Packages</a><ul>
<li class="chapter" data-level="6.1" data-path="packages.html"><a href="packages.html#why-you-need-your-own-packages-in-your-life"><i class="fa fa-check"></i><b>6.1</b> Why you need your own packages in your life</a></li>
<li class="chapter" data-level="6.2" data-path="packages.html"><a href="packages.html#r-packages-the-basics"><i class="fa fa-check"></i><b>6.2</b> R packages: the basics</a></li>
<li class="chapter" data-level="6.3" data-path="packages.html"><a href="packages.html#writing-documentation-for-your-functions"><i class="fa fa-check"></i><b>6.3</b> Writing documentation for your functions</a></li>
<li class="chapter" data-level="6.4" data-path="packages.html"><a href="packages.html#extra-files-inside-your-package-and-dependencies"><i class="fa fa-check"></i><b>6.4</b> Extra files inside your package and dependencies</a><ul>
<li class="chapter" data-level="6.4.1" data-path="packages.html"><a href="packages.html#the-namespace-file"><i class="fa fa-check"></i><b>6.4.1</b> The <code>NAMESPACE</code> file</a></li>
<li class="chapter" data-level="6.4.2" data-path="packages.html"><a href="packages.html#how-can-you-use-functions-from-other-packages-inside-your-package"><i class="fa fa-check"></i><b>6.4.2</b> How can you use functions from other packages inside your package?</a></li>
</ul></li>
<li class="chapter" data-level="6.5" data-path="packages.html"><a href="packages.html#unit-test-your-package"><i class="fa fa-check"></i><b>6.5</b> Unit test your package</a></li>
<li class="chapter" data-level="6.6" data-path="packages.html"><a href="packages.html#checking-the-coverage-of-your-unit-tests-with-covr"><i class="fa fa-check"></i><b>6.6</b> Checking the coverage of your unit tests with <code>covr</code></a></li>
<li class="chapter" data-level="6.7" data-path="packages.html"><a href="packages.html#wrap-up-1"><i class="fa fa-check"></i><b>6.7</b> Wrap-up</a></li>
</ul></li>
<li class="chapter" data-level="7" data-path="unit-testing.html"><a href="unit-testing.html"><i class="fa fa-check"></i><b>7</b> Unit testing</a><ul>
<li class="chapter" data-level="7.1" data-path="unit-testing.html"><a href="unit-testing.html#introduction"><i class="fa fa-check"></i><b>7.1</b> Introduction</a></li>
<li class="chapter" data-level="7.2" data-path="unit-testing.html"><a href="unit-testing.html#unit-testing-with-the-testthat-package"><i class="fa fa-check"></i><b>7.2</b> Unit testing with the <code>testthat</code> package</a></li>
<li class="chapter" data-level="7.3" data-path="unit-testing.html"><a href="unit-testing.html#actually-running-your-tests"><i class="fa fa-check"></i><b>7.3</b> Actually running your tests</a></li>
<li class="chapter" data-level="7.4" data-path="unit-testing.html"><a href="unit-testing.html#wrap-up-2"><i class="fa fa-check"></i><b>7.4</b> Wrap-up</a></li>
<li class="chapter" data-level="7.5" data-path="unit-testing.html"><a href="unit-testing.html#exercises-2"><i class="fa fa-check"></i><b>7.5</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="8" data-path="putting-it-all-together-writing-a-package-to-work-on-data.html"><a href="putting-it-all-together-writing-a-package-to-work-on-data.html"><i class="fa fa-check"></i><b>8</b> Putting it all together: writing a package to work on data</a><ul>
<li class="chapter" data-level="8.1" data-path="putting-it-all-together-writing-a-package-to-work-on-data.html"><a href="putting-it-all-together-writing-a-package-to-work-on-data.html#getting-the-data"><i class="fa fa-check"></i><b>8.1</b> Getting the data</a></li>
<li class="chapter" data-level="8.2" data-path="putting-it-all-together-writing-a-package-to-work-on-data.html"><a href="putting-it-all-together-writing-a-package-to-work-on-data.html#your-first-data-munging-package-preparedata"><i class="fa fa-check"></i><b>8.2</b> Your first data munging package: <code>prepareData</code></a><ul>
<li class="chapter" data-level="8.2.1" data-path="putting-it-all-together-writing-a-package-to-work-on-data.html"><a href="putting-it-all-together-writing-a-package-to-work-on-data.html#reading-a-lot-of-datasets-at-once"><i class="fa fa-check"></i><b>8.2.1</b> Reading a lot of datasets at once</a></li>
<li class="chapter" data-level="8.2.2" data-path="putting-it-all-together-writing-a-package-to-work-on-data.html"><a href="putting-it-all-together-writing-a-package-to-work-on-data.html#treating-the-columns-of-your-datasets"><i class="fa fa-check"></i><b>8.2.2</b> Treating the columns of your datasets</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
<li class="divider"></li>
<li><a href="https://github.com/rstudio/bookdown" target="blank">Published with bookdown</a></li>

</ul>

      </nav>
    </div>

    <div class="book-body">
      <div class="body-inner">
        <div class="book-header" role="navigation">
          <h1>
            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Functional programming and unit testing for data munging with R</a>
          </h1>
        </div>

        <div class="page-wrapper" tabindex="-1" role="main">
          <div class="page-inner">

            <section class="normal" id="section-">
<div id="putting-it-all-together-writing-a-package-to-work-on-data" class="section level1">
<h1><span class="header-section-number">Chapter 8</span> Putting it all together: writing a package to work on data</h1>
<p>Everything we have seen until now allows us to develop our own packages with the goal of <em>working</em> on data. By <em>working</em> on data I mean any operation that involves cleaning, transforming, analyzing or plotting data. I will summarize why everything we have seen until now helps us in this task:</p>
<ol style="list-style-type: decimal">
<li>Functional programming makes our code easier to test</li>
<li>Unit tests make sure our code is correct</li>
<li>Packages allows us to forget about paths, so unit tests are easier to run, makes writing documentation easier and makes sharing our code easier</li>
</ol>
<p>For the rest of this chapter we are going to work with mock datasets that I created. The data is completely random but for our purposes it does not matter. In this chapter, we are going to write a number of functions with the goal of going from these awful, badly formatted datasets to a nice longitudinal data set.</p>
<div id="getting-the-data" class="section level2">
<h2><span class="header-section-number">8.1</span> Getting the data</h2>
<p>You can download the data from the <a href="https://github.com/b-rodrigues/functional_programming_and_unit_testing_for_data_munging">github repository</a> of the book. There are 5 <code>.csv</code> files that comprise the data sets we are going to work with:</p>
<ul>
<li><code>data_2000.csv</code></li>
<li><code>data_2001.csv</code></li>
<li><code>data_2002.csv</code></li>
<li><code>data_2003.csv</code></li>
<li><code>data_2004.csv</code></li>
</ul>
<p>The first step, of course, is to load these datasets into R. For 5 datasets, I assume that you would simply write the following into Rstudio:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">data_<span class="dv">2000</span> &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;/path/to/data/data_2000.csv&quot;</span>, <span class="dt">header =</span> T)
data_<span class="dv">2001</span> &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;/path/to/data/data_2001.csv&quot;</span>, <span class="dt">header =</span> T)
data_<span class="dv">2002</span> &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;/path/to/data/data_2002.csv&quot;</span>, <span class="dt">header =</span> T)
data_<span class="dv">2003</span> &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;/path/to/data/data_2003.csv&quot;</span>, <span class="dt">header =</span> T)
data_<span class="dv">2004</span> &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;/path/to/data/data_2004.csv&quot;</span>, <span class="dt">header =</span> T)</code></pre></div>
<p>This might be ok for 5 datasets which are named very similarily, especially since you can do block editing in Rstudio. However, imagine that you have hundreds, thousands, of datasets? And image that their names are not so well formatted as here? We will start our package by writing a function that reads a lot of datasets at once.</p>
</div>
<div id="your-first-data-munging-package-preparedata" class="section level2">
<h2><span class="header-section-number">8.2</span> Your first data munging package: <code>prepareData</code></h2>
<div id="reading-a-lot-of-datasets-at-once" class="section level3">
<h3><span class="header-section-number">8.2.1</span> Reading a lot of datasets at once</h3>
<p>Using Rstudio, create a new project like shown in the previous chapter, and select <em>R package</em>. Give it a name, for example <code>prepareData</code>. If you are working with datasets that have a name, for example the <em>Penn World Tables</em>, you could call your package <code>preparePWT</code>, or something similar. By the way, we are going to work on some test data sets that I created for illustration purposes. When you will develop your own package to work on your own data, you do not have to write unit tests that use you original data. A subset can be enough, or taking the time to create a small test dataset might be preferable. It depends on what features of your functions you want to test. The first function I will show you is actually very general and could work with any datasets. This means that I created a package called <code>broTools</code><a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a> that contains all the little functions that I use daily. But for illustration purposes, we will put this function inside <code>prepareData</code>, even if it does not have anything directly to do with it. I have called this function <code>read_list()</code> and here is the source code:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co">#&#39; Reads a list of datasets</span>
<span class="co">#&#39; @param list_of_datasets A list of datasets (names of datasets are strings)</span>
<span class="co">#&#39; @param read_func A function, the read function to use to read the data</span>
<span class="co">#&#39; @return Returns a list of the datasets</span>
<span class="co">#&#39; @export</span>
<span class="co">#&#39; @examples</span>
<span class="co">#&#39; \dontrun{</span>
<span class="co">#&#39; setwd(&quot;path/to/datasets/&quot;)</span>
<span class="co">#&#39; list_of_datasets &lt;- list.files(pattern = &quot;*.csv&quot;)</span>
<span class="co">#&#39; list_of_loaded_datasets &lt;- read_list(list_of_datasets, read_func = read.csv)</span>
<span class="co">#&#39; }</span>
read_list &lt;-<span class="st"> </span><span class="cf">function</span>(list_of_datasets, read_func, ...){

    <span class="kw">stopifnot</span>(<span class="kw">length</span>(list_of_datasets)<span class="op">&gt;</span><span class="dv">0</span>)

    read_and_assign &lt;-<span class="st"> </span><span class="cf">function</span>(dataset, read_func){
        dataset_name &lt;-<span class="st"> </span><span class="kw">as.name</span>(dataset)
        dataset_name &lt;-<span class="st"> </span><span class="kw">read_func</span>(dataset, ...)
}

    <span class="co"># invisible is used to suppress the unneeded output</span>
    output &lt;-<span class="st"> </span><span class="kw">invisible</span>(
        purrr<span class="op">::</span><span class="kw">map</span>(list_of_datasets,
                   read_and_assign,
                   <span class="dt">read_func =</span> read_func)
                   )

    <span class="co"># Remove the &quot;.csv&quot; at the end of the data set names</span>
    names_of_datasets &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="kw">unlist</span>(<span class="kw">strsplit</span>(list_of_datasets, <span class="st">&quot;[.]&quot;</span>))[<span class="kw">c</span>(T, F)])
    <span class="kw">names</span>(output) &lt;-<span class="st"> </span>names_of_datasets
    <span class="kw">return</span>(output)
}</code></pre></div>
<p>The basic idea of <code>read_list()</code> is that it takes a list of datasets as the first argument, then a functon to read in the datasets as a second argument and as a third argument the famous <code>...</code>, which allows the user to specify further options to other functions that are contained in the body of the main function. In this case, further arguments are passed to the <code>read_func</code> function, for example if your data does not contains headers, you could pass the option <code>header = FALSE</code> to <code>read_list()</code> which would then get passed to <code>read_func</code>. I use <code>purrr::map()</code> to apply <code>read_and_assign()</code>; a helper function whose role is to read in a dataset and save it with its name, to the whole list of datasets. This step is wrapped inside <code>invisible()</code> as to remove unecessary output. Finally I use <code>strsplit()</code> with a regular expression to remove the extension of the dataset from its name. The output is thus a list of datasets where each dataset is named as it is on your hard drive. Save this function in a script called <code>read_list.R</code> and save it in the <code>R</code> folder of your package. Now you need to invoke <code>roxygen2::roxygenise()</code> to create the documentation of your function. I suggest you also run <code>devtools::use_testtthat</code>. This creates the necessary folder to hold your tests as well as creating a small <code>testthat.R</code> file with the code that gets called to run your tests. Without this, you might encounter weird issues (for example, <code>covr</code> not finding your tests!).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">roxygen2<span class="op">::</span><span class="kw">roxygenise</span>()</code></pre></div>
<pre><code>First time using roxygen2. Upgrading automatically...
Updating roxygen version in  /home/bro/Dropbox/prepareData/DESCRIPTION
Writing NAMESPACE
Writing read_list.Rd</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">devtools<span class="op">::</span><span class="kw">use_testthat</span>()</code></pre></div>
<pre><code>* Adding testthat to Suggests
* Creating `tests/testthat`.
* Creating `tests/testthat.R` from template.</code></pre>
<p>Now let us check the coverage of our package:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">&quot;covr&quot;</span>)

cov &lt;-<span class="st"> </span><span class="kw">package_coverage</span>()

<span class="kw">shine</span>(cov)</code></pre></div>
<p>Unsurprisingly we get a coverage of 0% for our package. We will now write a unit test for this function. For example, let us see if the condition <code>stopifnot(length(list_of_datasets)&gt;0)</code> works. Because you ran <code>detools::use_testthat()</code> you should have a folder called <code>tests</code> on the root of your project directory. In it, there is a folder called <code>testthat</code>. This is were you will save your unit tests, and any file needed for the tests to run (for example, mock datasets that are used by tests).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">&quot;testthat&quot;</span>)
<span class="kw">library</span>(<span class="st">&quot;prepareData&quot;</span>)

<span class="kw">test_that</span>(<span class="st">&quot;Try to import empty list of datasets: this may be caused because</span>
<span class="st">          the path to the datasets is wrong for instance&quot;</span>,{

    list_datasets &lt;-<span class="st"> </span><span class="ot">NULL</span>

    <span class="kw">expect_error</span>(<span class="kw">read_list</span>(list_datasets, read_csv, <span class="dt">col_types =</span> <span class="kw">cols</span>()))
})</code></pre></div>
<p>Run the test using <code>CTRL-SHIFT-T</code> if you are on Rstudio.</p>
<pre><code>==&gt; devtools::test()

Loading prepareData
Loading required package: testthat
Testing prepareData
.
DONE ===========================================================================</code></pre>
<p>This is the output you should see. If you check the coverage of your package, you should see that the line <code>stopifnot(length(list_of_datasets)&gt;0)</code> is highlightened in green and you should have around 9% of coverage for your package. You can spend some to to get the coverage as high as possible, but you have to take into account the time it will take you to write tests vs the benefits you are going to get from them. In the case of this function, I do not really see what more you could test.</p>
<p>Let us use this function to read in the datasets:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">&quot;readr&quot;</span>)
<span class="kw">library</span>(<span class="st">&quot;purrr&quot;</span>)
<span class="kw">library</span>(<span class="st">&quot;tibble&quot;</span>)

list_of_data &lt;-<span class="st"> </span><span class="kw">Sys.glob</span>(<span class="st">&quot;assets/*.csv&quot;</span>)

datasets &lt;-<span class="st"> </span><span class="kw">read_list</span>(list_of_data, read_csv, <span class="dt">col_type =</span> <span class="kw">cols</span>())</code></pre></div>
<p><code>list_of_data</code> is a variable that contains the path to the datasets. I used <code>Sys.glob(&quot;assets/*.csv&quot;)</code> to find the datasets. The datasets are saved in the <code>assets</code> folder of the book and end with the <code>.csv</code> extension. You could also use <code>list.files(&quot;*.csv&quot;)</code> to achieve the same. Let’s take a look inside this list using <code>head()</code>. Since <code>head()</code> only works on single data frames or tibbles, we use <code>map()</code> to apply <code>head()</code> to each data frame on the list.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">map</span>(datasets, head)</code></pre></div>
<pre><code>## $`assets/data_2000`
## # A tibble: 6 x 6
##      id Variable1 other2000 gender2000 eggs2000      spam2000
##   &lt;int&gt;     &lt;int&gt;     &lt;int&gt;      &lt;chr&gt;    &lt;int&gt;         &lt;chr&gt;
## 1     1        32         3          F       80 -1.5035369157
## 2     2        28         2          F       20 -0.1836726393
## 3     3        36         4          M       58 -0.6851988608
## 4     4        28         1          F       30  1.9900760191
## 5     5        34         3          F       14  0.4324725273
## 6     6        30         3          F       40   -0.79001853
## 
## $`assets/data_2001`
## # A tibble: 6 x 6
##      id VARIABLE1 other2001 Gender2001 eggs2001   spam2001
##   &lt;int&gt;     &lt;int&gt;     &lt;int&gt;      &lt;chr&gt;    &lt;int&gt;      &lt;dbl&gt;
## 1     1        32         3          F       80 -1.5035369
## 2     2        28         2          F       20 -0.1836726
## 3     3        36         4          M       58 -0.6851989
## 4     4        28         1          F       30  1.9900760
## 5     5        34         3          F       14  0.4324725
## 6     6        30         3          F       40 -0.7900185
## 
## $`assets/data_2002`
## # A tibble: 6 x 6
##      ID variable1 Other2002 gender2002 eggs2002   Spam2002
##   &lt;int&gt;     &lt;int&gt;     &lt;int&gt;      &lt;chr&gt;    &lt;int&gt;      &lt;dbl&gt;
## 1     1        32         3          F       80 -1.5035369
## 2     2        28         2          F       20 -0.1836726
## 3     3        36         4          M       58 -0.6851989
## 4     4        28         1          F       30  1.9900760
## 5     5        34         3          F       14  0.4324725
## 6     6        30         3          F       40 -0.7900185
## 
## $`assets/data_2003`
## # A tibble: 6 x 6
##      id variable1 other2003 gender2003 EGGS2003   spam2003
##   &lt;int&gt;     &lt;int&gt;     &lt;int&gt;      &lt;chr&gt;    &lt;int&gt;      &lt;dbl&gt;
## 1     1        32         3          F       80 -1.5035369
## 2     2        28         2          F       20 -0.1836726
## 3     3        36         4          M       58 -0.6851989
## 4     4        28         1          F       30  1.9900760
## 5     5        34         3          F       14  0.4324725
## 6     6        30         3          F       40 -0.7900185
## 
## $`assets/data_2004`
## # A tibble: 6 x 6
##      Id Variable1 Other2004 Gender2004 Eggs2004   Spam2004
##   &lt;int&gt;     &lt;int&gt;     &lt;int&gt;      &lt;chr&gt;    &lt;int&gt;      &lt;dbl&gt;
## 1     1        32         3          F       80 -1.5035369
## 2     2        28         2          F       20 -0.1836726
## 3     3        36         4          M       58 -0.6851989
## 4     4        28         1          F       30  1.9900760
## 5     5        34         3          F       14  0.4324725
## 6     6        30         3          F       40 -0.7900185</code></pre>
<p>The datasets we will work with all have the the same variables and the same inviduals. We have datasets for the years 2000 to 2004. It would be much better for analysis if we could have clean variable names and merge every datasets together in a single, longitudinal dataset. In short, what we need:</p>
<ul>
<li>Have nice names for the columns.</li>
<li>Remove the year from the name of the columns and add a column containing the year.</li>
<li>Merge every dataset together.</li>
</ul>
<p>This is to make the dataset tidy, as explained <span class="citation">Wickham (<a href="#ref-wickham2014tidy">2014</a><a href="#ref-wickham2014tidy">b</a>)</span>. Of course, depending on your needs, you might need to add further operations, for example creating new variables etc. For now, we are going to focus on these three steps.</p>
</div>
<div id="treating-the-columns-of-your-datasets" class="section level3">
<h3><span class="header-section-number">8.2.2</span> Treating the columns of your datasets</h3>
<p>Let us take a look at the column names of the datasets:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">map</span>(datasets, colnames)</code></pre></div>
<pre><code>## $`assets/data_2000`
## [1] &quot;id&quot;         &quot;Variable1&quot;  &quot;other2000&quot;  &quot;gender2000&quot; &quot;eggs2000&quot;  
## [6] &quot;spam2000&quot;  
## 
## $`assets/data_2001`
## [1] &quot;id&quot;         &quot;VARIABLE1&quot;  &quot;other2001&quot;  &quot;Gender2001&quot; &quot;eggs2001&quot;  
## [6] &quot;spam2001&quot;  
## 
## $`assets/data_2002`
## [1] &quot;ID&quot;         &quot;variable1&quot;  &quot;Other2002&quot;  &quot;gender2002&quot; &quot;eggs2002&quot;  
## [6] &quot;Spam2002&quot;  
## 
## $`assets/data_2003`
## [1] &quot;id&quot;         &quot;variable1&quot;  &quot;other2003&quot;  &quot;gender2003&quot; &quot;EGGS2003&quot;  
## [6] &quot;spam2003&quot;  
## 
## $`assets/data_2004`
## [1] &quot;Id&quot;         &quot;Variable1&quot;  &quot;Other2004&quot;  &quot;Gender2004&quot; &quot;Eggs2004&quot;  
## [6] &quot;Spam2004&quot;</code></pre>
<p>This is very messy, we would need to have a function that would clean all this mess and “normalize” these column names. Turns out that we’re lucky, and there is exactly what we are looking for in the <code>janitor</code> package. The function <code>janitor::clean_names()</code> does exactly this. Let’s use it and see the output:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">&quot;janitor&quot;</span>)

datasets &lt;-<span class="st"> </span><span class="kw">map</span>(datasets, clean_names)

<span class="kw">map</span>(datasets, colnames)</code></pre></div>
<pre><code>## $`assets/data_2000`
## [1] &quot;id&quot;         &quot;variable1&quot;  &quot;other2000&quot;  &quot;gender2000&quot; &quot;eggs2000&quot;  
## [6] &quot;spam2000&quot;  
## 
## $`assets/data_2001`
## [1] &quot;id&quot;         &quot;variable1&quot;  &quot;other2001&quot;  &quot;gender2001&quot; &quot;eggs2001&quot;  
## [6] &quot;spam2001&quot;  
## 
## $`assets/data_2002`
## [1] &quot;id&quot;         &quot;variable1&quot;  &quot;other2002&quot;  &quot;gender2002&quot; &quot;eggs2002&quot;  
## [6] &quot;spam2002&quot;  
## 
## $`assets/data_2003`
## [1] &quot;id&quot;         &quot;variable1&quot;  &quot;other2003&quot;  &quot;gender2003&quot; &quot;eggs2003&quot;  
## [6] &quot;spam2003&quot;  
## 
## $`assets/data_2004`
## [1] &quot;id&quot;         &quot;variable1&quot;  &quot;other2004&quot;  &quot;gender2004&quot; &quot;eggs2004&quot;  
## [6] &quot;spam2004&quot;</code></pre>
<p>This is much better. If <code>clean_names()</code> didn’t exist, you would have to have written your own function for this. This could have been a complicated exercise, depending on how messy and heterogenous the variable names would have been in your data. However <code>clean_names()</code> does a great job, so there’s no need to reivent the wheel!</p>
<p>Now we would like to remove the years from the column names and add a column with the name of each dataset. Let us start by removing the years from the column names by writing a function. For this function, a little regular expression knowledge will not hurt. Here is what the function looks like:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co">#&#39; Remove year strings from column names</span>
<span class="co">#&#39; @param list_of_datasets A list containing named datasets</span>
<span class="co">#&#39; @return A list of datasets with the supplied string prepended to the column names</span>
<span class="co">#&#39; @description This function removes year strings from column names, meaning that a column called</span>
<span class="co">#&#39; &quot;eggs9000&quot; gets renamed into &quot;eggs&quot;</span>
<span class="co">#&#39; @export</span>
<span class="co">#&#39; @examples</span>
<span class="co">#&#39; \dontrun{</span>
<span class="co">#&#39; #`list_of_data_sets` is a list containing named data sets</span>
<span class="co">#&#39; # For example, to access the first data set, called dataset_1 you would</span>
<span class="co">#&#39; # write</span>
<span class="co">#&#39; list_of_data_sets$dataset_1</span>
<span class="co">#&#39; remove_years_from_strings(list_of_data_sets)</span>
<span class="co">#&#39; }</span>
remove_years_from_strings &lt;-<span class="st"> </span><span class="cf">function</span>(list_of_datasets){

  for_one_dataset &lt;-<span class="st"> </span><span class="cf">function</span>(dataset){
    <span class="co"># strsplit() accepts regular expressions, so it&#39;s easy to get rid of a number made up of</span>
    <span class="co"># *exactly* 4 digits</span>

    <span class="kw">colnames</span>(dataset) &lt;-<span class="st"> </span><span class="kw">unlist</span>(<span class="kw">strsplit</span>(<span class="kw">colnames</span>(dataset), <span class="st">&quot;</span><span class="ch">\\</span><span class="st">d{4}&quot;</span>, <span class="dt">perl =</span> <span class="ot">TRUE</span>))
    <span class="kw">return</span>(dataset)
  }

  output &lt;-<span class="st"> </span>purrr<span class="op">::</span><span class="kw">map</span>(list_of_datasets, for_one_dataset)

  <span class="kw">return</span>(output)
}</code></pre></div>
<p>and here is the accompanying unit test:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">&quot;testthat&quot;</span>)
<span class="kw">library</span>(<span class="st">&quot;prepareData&quot;</span>)
<span class="kw">library</span>(<span class="st">&quot;readr&quot;</span>)

data_sets &lt;-<span class="st"> </span><span class="kw">list.files</span>(<span class="dt">pattern =</span> <span class="st">&quot;2001&quot;</span>)

data_list &lt;-<span class="st"> </span><span class="kw">read_list</span>(data_sets, read_csv, <span class="dt">col_types =</span> <span class="kw">cols</span>())

<span class="kw">test_that</span>(<span class="st">&quot;Test remove years from srings&quot;</span>,{
    data_list_result &lt;-<span class="st"> </span>purr<span class="op">::</span><span class="kw">map</span>(data_list, janitor<span class="op">::</span>clean_names)
    data_list_result &lt;-<span class="st"> </span><span class="kw">remove_years_from_strings</span>(data_list_result)
    expect &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;id&quot;</span>, <span class="st">&quot;year_&quot;</span>, <span class="st">&quot;variable1&quot;</span>, <span class="st">&quot;other&quot;</span>, <span class="st">&quot;gender&quot;</span>, <span class="st">&quot;eggs&quot;</span>, <span class="st">&quot;spam&quot;</span>)
    actual &lt;-<span class="st"> </span><span class="kw">colnames</span>(data_list_result[[<span class="dv">1</span>]])
    <span class="kw">expect_equal</span>(expect, actual)
})</code></pre></div>
<p>For the unit test to work, I had to add the dataset for the year 2001 in the <code>tests/testthat</code> directory. Again, this dataset does not have to be the real dataset you will ultimately be working on. A mock dataset with simulated data on 10 rows and with the same column names works exactly the same!</p>
<p>Let’s take a look at the output:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">datasets &lt;-<span class="st"> </span><span class="kw">remove_years_from_strings</span>(datasets)

<span class="kw">map</span>(datasets, colnames)</code></pre></div>
<pre><code>## $`assets/data_2000`
## [1] &quot;id&quot;        &quot;variable1&quot; &quot;other&quot;     &quot;gender&quot;    &quot;eggs&quot;      &quot;spam&quot;     
## 
## $`assets/data_2001`
## [1] &quot;id&quot;        &quot;variable1&quot; &quot;other&quot;     &quot;gender&quot;    &quot;eggs&quot;      &quot;spam&quot;     
## 
## $`assets/data_2002`
## [1] &quot;id&quot;        &quot;variable1&quot; &quot;other&quot;     &quot;gender&quot;    &quot;eggs&quot;      &quot;spam&quot;     
## 
## $`assets/data_2003`
## [1] &quot;id&quot;        &quot;variable1&quot; &quot;other&quot;     &quot;gender&quot;    &quot;eggs&quot;      &quot;spam&quot;     
## 
## $`assets/data_2004`
## [1] &quot;id&quot;        &quot;variable1&quot; &quot;other&quot;     &quot;gender&quot;    &quot;eggs&quot;      &quot;spam&quot;</code></pre>
<p>This is starting to look like something!</p>
<p>Now, since we removed the years from the column names, we need to add a column containing the year to our datasets. And now to add the year column:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co">#&#39; Adds the year column</span>
<span class="co">#&#39; @param list_of_datasets A list containing named datasets</span>
<span class="co">#&#39; @return A list of datasets with the year column</span>
<span class="co">#&#39; @description This function works by extracting the year string contained in</span>
<span class="co">#&#39; the data set name and appending a new column to the data set with the numeric</span>
<span class="co">#&#39; value of the year. This means that the data sets have to have a name of the</span>
<span class="co">#&#39; form data_set_2001 or data_2001_europe, etc</span>
<span class="co">#&#39; @export</span>
<span class="co">#&#39; @examples</span>
<span class="co">#&#39; \dontrun{</span>
<span class="co">#&#39; #`list_of_data_sets` is a list containing named data sets</span>
<span class="co">#&#39; # For example, to access the first data set, called dataset_1 you would</span>
<span class="co">#&#39; # write</span>
<span class="co">#&#39; list_of_data_sets$dataset_1</span>
<span class="co">#&#39; add_year_column(list_of_data_sets)</span>
<span class="co">#&#39; }</span>
add_year_column &lt;-<span class="st"> </span><span class="cf">function</span>(list_of_datasets){

  for_one_dataset &lt;-<span class="st"> </span><span class="cf">function</span>(dataset, dataset_name){

    <span class="co"># Split the name of the dataset at the &quot;_&quot;. The datasets must have a name of the</span>
    <span class="co"># form &quot;data_2000&quot; (notice the underscore).</span>
    name_year &lt;-<span class="st"> </span><span class="kw">unlist</span>(<span class="kw">strsplit</span>(dataset_name, <span class="st">&quot;[_.]&quot;</span>))
    <span class="co"># Get the index of the string that contains digits</span>
    index &lt;-<span class="st"> </span><span class="kw">grep</span>(<span class="st">&quot;</span><span class="ch">\\</span><span class="st">d+&quot;</span>, name_year)

    <span class="co"># Get the year</span>
    year &lt;-<span class="st"> </span><span class="kw">as.numeric</span>(name_year[index])

    <span class="co"># Add it to the data set</span>
    dataset<span class="op">$</span>year &lt;-<span class="st"> </span>year
    <span class="kw">return</span>(dataset)
  }

  output &lt;-<span class="st"> </span>purrr<span class="op">::</span><span class="kw">map2</span>(list_of_datasets, <span class="kw">names</span>(list_of_datasets), for_one_dataset)
  <span class="kw">return</span>(output)
}</code></pre></div>
<p>And its unit test:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">&quot;testthat&quot;</span>)
<span class="kw">library</span>(<span class="st">&quot;prepareData&quot;</span>)
<span class="kw">library</span>(<span class="st">&quot;readr&quot;</span>)


data_sets &lt;-<span class="st"> </span><span class="kw">list.files</span>(<span class="dt">pattern =</span> <span class="st">&quot;data&quot;</span>)

data_list &lt;-<span class="st"> </span><span class="kw">read_list</span>(data_sets, read_csv, <span class="dt">col_types =</span> <span class="kw">cols</span>())

<span class="kw">test_that</span>(<span class="st">&quot;Test add year column&quot;</span>,{
    data_list_result &lt;-<span class="st"> </span>purrr<span class="op">::</span><span class="kw">map</span>(data_list, janitor<span class="op">::</span>clean_names)
    data_list_result &lt;-<span class="st"> </span><span class="kw">add_year_column</span>(data_list_result)
    expect &lt;-<span class="st"> </span><span class="kw">list</span>(<span class="kw">rep</span>(<span class="dv">2001</span>, <span class="dv">1000</span>), <span class="kw">rep</span>(<span class="dv">2002</span>, <span class="dv">1000</span>))
    actual &lt;-<span class="st"> </span><span class="kw">list</span>(data_list_result[[<span class="dv">1</span>]]<span class="op">$</span>year, data_list_result[[<span class="dv">2</span>]]<span class="op">$</span>year)
    <span class="kw">expect_equal</span>(expect, actual)
})</code></pre></div>
<p>This function does not work if the names of the datasets are not of the form “data_2000”. This means that this function should have either an additional argument, where you specify the separator (for example “_&quot; or “.” or even “-”) or fail if the name does not contain an “_“. I like the second solution better:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co">#&#39; Adds the year column</span>
<span class="co">#&#39; @param list_of_datasets A list containing named datasets</span>
<span class="co">#&#39; @return A list of datasets with the year column</span>
<span class="co">#&#39; @description This function works by extracting the year string contained in</span>
<span class="co">#&#39; the data set name and appending a new column to the data set with the numeric</span>
<span class="co">#&#39; value of the year. This means that the data sets have to have a name of the</span>
<span class="co">#&#39; form data_set_2001 or data_2001_europe, etc</span>
<span class="co">#&#39; @export</span>
<span class="co">#&#39; @examples</span>
<span class="co">#&#39; \dontrun{</span>
<span class="co">#&#39; #`list_of_data_sets` is a list containing named data sets</span>
<span class="co">#&#39; # For example, to access the first data set, called dataset_1 you would</span>
<span class="co">#&#39; # write</span>
<span class="co">#&#39; list_of_data_sets$dataset_1</span>
<span class="co">#&#39; add_year_column(list_of_data_sets)</span>
<span class="co">#&#39; }</span>
add_year_column &lt;-<span class="st"> </span><span class="cf">function</span>(list_of_datasets){

  for_one_dataset &lt;-<span class="st"> </span><span class="cf">function</span>(dataset, dataset_name){

    <span class="cf">if</span>(<span class="op">!</span>(<span class="st">&quot;_&quot;</span> <span class="op">%in%</span><span class="st"> </span><span class="kw">unlist</span>(<span class="kw">strsplit</span>(dataset_name, <span class="dt">split =</span> <span class="st">&quot;&quot;</span>)))){
    <span class="kw">stop</span>(<span class="st">&quot;Make sure that your datasets are named like</span>
<span class="st">         `data_2000.csv` or similar. The `_` between `data`</span>
<span class="st">         and `2000` is what matters&quot;</span>)}

    <span class="co"># Split the name of the dataset at the &quot;_&quot;. The datasets must have a name of the</span>
    <span class="co"># form &quot;data_2000&quot; (notice the underscore).</span>
    name_year &lt;-<span class="st"> </span><span class="kw">unlist</span>(<span class="kw">strsplit</span>(dataset_name, <span class="dt">split =</span> <span class="st">&quot;[_.]&quot;</span>))
    <span class="co"># Get the index of the string that contains digits</span>
    index &lt;-<span class="st"> </span><span class="kw">grep</span>(<span class="st">&quot;</span><span class="ch">\\</span><span class="st">d+&quot;</span>, name_year)

    <span class="co"># Get the year</span>
    year &lt;-<span class="st"> </span><span class="kw">as.numeric</span>(name_year[index])

    <span class="co"># Add it to the data set</span>
    dataset<span class="op">$</span>year &lt;-<span class="st"> </span>year
    <span class="kw">return</span>(dataset)
  }

  output &lt;-<span class="st"> </span>purrr<span class="op">::</span><span class="kw">map2</span>(list_of_datasets, <span class="kw">names</span>(list_of_datasets), for_one_dataset)
  <span class="kw">return</span>(output)
}</code></pre></div>
<p>If you check the coverage of this function, you will see that the lines that test if the datasets are correctly named do not get called. Let’s add a unit test that does this, but first, we need to create <em>wrong</em> datasets. Just copy the datasets you have in your tests folder, and rename them to <code>wrongdata2001.csv</code> and <code>wrongdata2002.csv</code>. We expect our function to stop with an error message if it tries anything on these datasets:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">data_sets &lt;-<span class="st"> </span><span class="kw">list.files</span>(<span class="dt">pattern =</span> <span class="st">&quot;wrong&quot;</span>)

data_list &lt;-<span class="st"> </span><span class="kw">read_list</span>(data_sets, read_csv, <span class="dt">col_types =</span> <span class="kw">cols</span>())

<span class="kw">test_that</span>(<span class="st">&quot;Test add year column: wrong name&quot;</span>,{
    data_list_result &lt;-<span class="st"> </span>purrr<span class="op">::</span><span class="kw">map</span>(data_list, janitor<span class="op">::</span>clean_names)
    <span class="kw">expect_error</span>(<span class="kw">add_year_column</span>(data_list_result))
})</code></pre></div>
<p>Now have fully covered your function, and you also know when the function breaks. With the informative error message, future you or your coworkers will know how to correctly name the datasets. Let’s try <code>add_year_column()</code> to see how it behaves on our data:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">datasets &lt;-<span class="st"> </span><span class="kw">add_year_column</span>(datasets)

<span class="kw">map</span>(datasets, head)</code></pre></div>
<pre><code>## $`assets/data_2000`
## # A tibble: 6 x 7
##      id variable1 other gender  eggs          spam  year
##   &lt;int&gt;     &lt;int&gt; &lt;int&gt;  &lt;chr&gt; &lt;int&gt;         &lt;chr&gt; &lt;dbl&gt;
## 1     1        32     3      F    80 -1.5035369157  2000
## 2     2        28     2      F    20 -0.1836726393  2000
## 3     3        36     4      M    58 -0.6851988608  2000
## 4     4        28     1      F    30  1.9900760191  2000
## 5     5        34     3      F    14  0.4324725273  2000
## 6     6        30     3      F    40   -0.79001853  2000
## 
## $`assets/data_2001`
## # A tibble: 6 x 7
##      id variable1 other gender  eggs       spam  year
##   &lt;int&gt;     &lt;int&gt; &lt;int&gt;  &lt;chr&gt; &lt;int&gt;      &lt;dbl&gt; &lt;dbl&gt;
## 1     1        32     3      F    80 -1.5035369  2001
## 2     2        28     2      F    20 -0.1836726  2001
## 3     3        36     4      M    58 -0.6851989  2001
## 4     4        28     1      F    30  1.9900760  2001
## 5     5        34     3      F    14  0.4324725  2001
## 6     6        30     3      F    40 -0.7900185  2001
## 
## $`assets/data_2002`
## # A tibble: 6 x 7
##      id variable1 other gender  eggs       spam  year
##   &lt;int&gt;     &lt;int&gt; &lt;int&gt;  &lt;chr&gt; &lt;int&gt;      &lt;dbl&gt; &lt;dbl&gt;
## 1     1        32     3      F    80 -1.5035369  2002
## 2     2        28     2      F    20 -0.1836726  2002
## 3     3        36     4      M    58 -0.6851989  2002
## 4     4        28     1      F    30  1.9900760  2002
## 5     5        34     3      F    14  0.4324725  2002
## 6     6        30     3      F    40 -0.7900185  2002
## 
## $`assets/data_2003`
## # A tibble: 6 x 7
##      id variable1 other gender  eggs       spam  year
##   &lt;int&gt;     &lt;int&gt; &lt;int&gt;  &lt;chr&gt; &lt;int&gt;      &lt;dbl&gt; &lt;dbl&gt;
## 1     1        32     3      F    80 -1.5035369  2003
## 2     2        28     2      F    20 -0.1836726  2003
## 3     3        36     4      M    58 -0.6851989  2003
## 4     4        28     1      F    30  1.9900760  2003
## 5     5        34     3      F    14  0.4324725  2003
## 6     6        30     3      F    40 -0.7900185  2003
## 
## $`assets/data_2004`
## # A tibble: 6 x 7
##      id variable1 other gender  eggs       spam  year
##   &lt;int&gt;     &lt;int&gt; &lt;int&gt;  &lt;chr&gt; &lt;int&gt;      &lt;dbl&gt; &lt;dbl&gt;
## 1     1        32     3      F    80 -1.5035369  2004
## 2     2        28     2      F    20 -0.1836726  2004
## 3     3        36     4      M    58 -0.6851989  2004
## 4     4        28     1      F    30  1.9900760  2004
## 5     5        34     3      F    14  0.4324725  2004
## 6     6        30     3      F    40 -0.7900185  2004</code></pre>
<p>Just as expected!</p>
<p>TBC…</p>

</div>
</div>
</div>
<h3>References</h3>
<div id="refs" class="references">
<div id="ref-wickham2014tidy">
<p>Wickham, Hadley. 2014b. “Tidy Data.” <em>Journal of Statistical Software</em> 59 (1): 1–23. doi:<a href="https://doi.org/10.18637/jss.v059.i10">10.18637/jss.v059.i10</a>.</p>
</div>
</div>
<div class="footnotes">
<hr />
<ol start="2">
<li id="fn2"><p>It stands for <code>Bruno Rodrigues' Tools</code>. I’m still working on releasing the package on Github, and maybe CRAN.<a href="putting-it-all-together-writing-a-package-to-work-on-data.html#fnref2">↩</a></p></li>
</ol>
</div>
            </section>

          </div>
        </div>
      </div>
<a href="unit-testing.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="references.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
    </div>
  </div>
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": false,
"facebook": true,
"twitter": true,
"google": false,
"weibo": false,
"instapper": false,
"vk": false,
"all": ["facebook", "google", "twitter", "weibo", "instapaper"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": "https://github.com/rstudio/bookdown-demo/edit/master/07-all_together.Rmd",
"text": "Edit"
},
"download": ["fp_tdd_data.pdf"],
"toc": {
"collapse": "subsection"
}
});
});
</script>

</body>

</html>