From b153c215808bcc7e7d19b2de60d3e01ed94d9ea0 Mon Sep 17 00:00:00 2001
From: jennybc <jenny@stat.ubc.ca>
Date: Wed, 5 Oct 2016 16:41:17 -0700
Subject: [PATCH] thoughts about working on vectors outside and inside tibbles

---
 block031_vector-tibble-relations.Rmd  | 123 +++++++++
 block031_vector-tibble-relations.html | 349 ++++++++++++++++++++++++++
 block031_vector-tibble-relations.md   | 214 ++++++++++++++++
 3 files changed, 686 insertions(+)
 create mode 100644 block031_vector-tibble-relations.Rmd
 create mode 100644 block031_vector-tibble-relations.html
 create mode 100644 block031_vector-tibble-relations.md

diff --git a/block031_vector-tibble-relations.Rmd b/block031_vector-tibble-relations.Rmd
new file mode 100644
index 00000000..dfd62ae2
--- /dev/null
+++ b/block031_vector-tibble-relations.Rmd
@@ -0,0 +1,123 @@
+---
+title: "Vectors versus tibbles"
+output:
+  html_document:
+    toc: true
+    toc_depth: 4
+---
+
+```{r setup, include = FALSE, cache = FALSE}
+knitr::opts_chunk$set(error = TRUE, collapse = TRUE, comment = "#>")
+```
+
+In STAT 545 and the Master of Data Science program, we teach data analysis starting with a clean data frame (or tibble), an excerpt of the [Gapminder](http://www.gapminder.org) data, from the [gapminder package](https://github.com/jennybc/gapminder). We study the tibble's extent and variable types. We practice filtering, selecting, arranging, summarizing, mutating, and visualizing.
+
+Then we gradually start to reveal the more complicated operations needed to produce such clean data, which requires working with various types of atomic vectors in R, such as character or factor, and even with other data structures altogether, such as matrices or lists.
+
+Two questions keep coming up:
+
+  * Why are you showing me how to do things to vectors two different ways: as naked vectors and as vectors inside a data frame?
+  * What's the general workflow for working on naked vectors vs vectors inside a data frame?
+
+### Load packages
+
+In our examples, we'll use various core tidyverse packages and stringr.
+
+```{r}
+library(tidyverse)
+library(stringr)
+```
+
+### Vector operation example
+
+Consider example `table3` from the `tidyr` package.
+
+```{r}
+table3
+```
+
+It gives the rate of new tuberculosis cases for 3 countries in two years. But the `rate` variable needs to be split into the numerator and denominator and converted to numeric if we really want to work with this data, rather than just gaze upon it.
+
+Here's one way to do that if you pull the `rate` variable out of the table via `$`.
+
+```{r}
+table3$rate %>%
+  str_split_fixed(pattern = "/", n = 2)
+```
+
+But now what? You've got a character matrix that is disassociated with `table3`. You've got more work to do before you can move on.
+
+When a variable needs lots of remedial work, this workflow is justified and unavoidable. But in many cases, you can fix variables "in place" and work inside the original tibble.
+
+### Tibble operation example
+
+If we want to stay in the world of data frames or tibbles, we can get a much nicer result with `tidyr::separate()`.
+
+```{r}
+table3 %>% 
+  separate(rate, into = c("cases", "population"), convert = TRUE)
+```
+
+This gives us a modified version of `table3` but with `rate` removed and replaced by proper integer variables with the numerator (`cases`) and the denominator (`population`).
+
+We are immediately ready to move on to further analysis or visualiation. When feasible, it is generally advisable to manipulate vectors inside the data frame where they live.
+
+### Workarounds and failure
+
+The above was a carefully chosen example, where the splitting functions existed in both settings. In general, you have more flexibility with operations for naked vectors than with vectors inside a tibble. Luckily there are many situations in which you can still manipulate a vector inside a tibble. Use `mutate()`!
+
+Let's convert `country` from character to factor in `table1`, the tidy version of this example dataset:
+
+```{r}
+table1 %>% 
+  mutate(country = factor(country))
+```
+
+The `mutate()` strategy even works when multiple vectors form the necessary input to create a single new vector. Going backwards, we could re-create the vexing `rate` variable "by hand" from `cases` and `population`:
+
+```{r}
+table1 %>% 
+  mutate(rate = paste(as.character(cases), as.character(population), sep = "/")) %>% 
+  select(-cases, -population)
+```
+
+Finally, if it's easier for your development process, you can always pull a variable out, work on it, then put it back in. What if we were crazy and preferred to have the year given in Roman numerals?
+
+```{r}
+## make a copy so I don't mess with table1
+tmp_df <- table1
+## create working copy of the variable of interest
+(tmp_var <- tmp_df$year)
+## do my thing ...please just pretend it's way more complicated ;)
+(tmp_var <- as.roman(tmp_var))
+## put it back into the tibble
+tmp_df$year <- tmp_var
+## admire our work
+tmp_df
+```
+
+Let's confront one genuinely fiddly scenario: what if you want to convert one (or more) existing variables into two (or more) new variables? And you aren't lucky enough to have the perfect function, like `tidyr::separate()`, available?
+
+Unfortunately, `mutate()` doesn't do exactly what you'd want.
+
+```{r}
+table3 %>% 
+  mutate(rate = str_split(rate, pattern = "/"))
+```
+
+Here the character strings with `cases`, and `population` are being stored as character vectors of length two inside a list-column, which is awkward to unpack.
+
+You would be better off to apply `str_split()` outside the tibble, make that into a tibble, then column bind it back in.
+
+```{r}
+new_vars <- table3$rate %>%
+  str_split_fixed(pattern = "/", n = 2) %>% 
+  as_tibble()
+colnames(new_vars) <- c("cases", "population")
+new_vars
+table3 %>% 
+  select(-rate) %>% 
+  bind_cols(new_vars)
+```
+
+One day, it's likely this will be easier (multi-mutate?) But we're not there yet.
diff --git a/block031_vector-tibble-relations.html b/block031_vector-tibble-relations.html
new file mode 100644
index 00000000..25d8ac93
--- /dev/null
+++ b/block031_vector-tibble-relations.html
@@ -0,0 +1,349 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+
+
+
+<title>Vectors versus tibbles</title>
+
+<script src="libs/jquery-1.11.3/jquery.min.js"></script>
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
+<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
+<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
+<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+  ga('create', 'UA-68219208-1', 'auto');
+  ga('send', 'pageview');
+
+</script>
+
+<style type="text/css">code{white-space: pre;}</style>
+<link rel="stylesheet"
+      href="libs/highlight/default.css"
+      type="text/css" />
+<script src="libs/highlight/highlight.js"></script>
+<style type="text/css">
+  pre:not([class]) {
+    background-color: white;
+  }
+</style>
+<script type="text/javascript">
+if (window.hljs && document.readyState && document.readyState === "complete") {
+   window.setTimeout(function() {
+      hljs.initHighlighting();
+   }, 0);
+}
+</script>
+
+
+
+<style type="text/css">
+h1 {
+  font-size: 34px;
+}
+h1.title {
+  font-size: 38px;
+}
+h2 {
+  font-size: 30px;
+}
+h3 {
+  font-size: 24px;
+}
+h4 {
+  font-size: 18px;
+}
+h5 {
+  font-size: 16px;
+}
+h6 {
+  font-size: 12px;
+}
+.table th:not([align]) {
+  text-align: left;
+}
+</style>
+
+<link rel="stylesheet" href="libs/local/main.css" type="text/css" />
+<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />
+<link rel="stylesheet" href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" type="text/css" />
+
+</head>
+
+<body>
+
+<style type = "text/css">
+.main-container {
+  max-width: 940px;
+  margin-left: auto;
+  margin-right: auto;
+}
+code {
+  color: inherit;
+  background-color: rgba(0, 0, 0, 0.04);
+}
+img {
+  max-width:100%;
+  height: auto;
+}
+.tabbed-pane {
+  padding-top: 12px;
+}
+button.code-folding-btn:focus {
+  outline: none;
+}
+</style>
+
+
+
+<div class="container-fluid main-container">
+
+<!-- tabsets -->
+<script src="libs/navigation-1.1/tabsets.js"></script>
+<script>
+$(document).ready(function () {
+  window.buildTabsets("TOC");
+});
+</script>
+
+<!-- code folding -->
+
+
+
+
+
+<header>
+  <div class="nav">
+    <a class="nav-logo" href="index.html">
+      <img src="static/img/stat545-logo-s.png" width="70px" height="70px"/>
+    </a>
+    <ul>
+      <li class="home"><a href="index.html">Home</a></li>
+      <li class="faq"><a href="faq.html">FAQ</a></li>
+      <li class="syllabus"><a href="syllabus.html">Syllabus</a></li>
+      <li class="topics"><a href="topics.html">Topics</a></li>
+      <li class="people"><a href="people.html">People</a></li>
+    </ul>
+  </div>
+</header>
+
+<div class="fluid-row" id="header">
+
+
+
+<h1 class="title toc-ignore">Vectors versus tibbles</h1>
+
+</div>
+
+<div id="TOC">
+<ul>
+<li><a href="#load-packages">Load packages</a></li>
+<li><a href="#vector-operation-example">Vector operation example</a></li>
+<li><a href="#tibble-operation-example">Tibble operation example</a></li>
+<li><a href="#workarounds-and-failure">Workarounds and failure</a></li>
+</ul>
+</div>
+
+<p>In STAT 545 and the Master of Data Science program, we teach data analysis starting with a clean data frame (or tibble), an excerpt of the <a href="http://www.gapminder.org">Gapminder</a> data, from the <a href="https://github.com/jennybc/gapminder">gapminder package</a>. We study the tibble’s extent and variable types. We practice filtering, selecting, arranging, summarizing, mutating, and visualizing.</p>
+<p>Then we gradually start to reveal the more complicated operations needed to produce such clean data, which requires working with various types of atomic vectors in R, such as character or factor, and even with other data structures altogether, such as matrices or lists.</p>
+<p>Two questions keep coming up:</p>
+<ul>
+<li>Why are you showing me how to do things to vectors two different ways: as naked vectors and as vectors inside a data frame?</li>
+<li>What’s the general workflow for working on naked vectors vs vectors inside a data frame?</li>
+</ul>
+<div id="load-packages" class="section level3">
+<h3>Load packages</h3>
+<p>In our examples, we’ll use various core tidyverse packages and stringr.</p>
+<pre class="r"><code>library(tidyverse)
+#&gt; Loading tidyverse: ggplot2
+#&gt; Loading tidyverse: tibble
+#&gt; Loading tidyverse: tidyr
+#&gt; Loading tidyverse: readr
+#&gt; Loading tidyverse: purrr
+#&gt; Loading tidyverse: dplyr
+#&gt; Conflicts with tidy packages ----------------------------------------------
+#&gt; filter(): dplyr, stats
+#&gt; lag():    dplyr, stats
+library(stringr)</code></pre>
+</div>
+<div id="vector-operation-example" class="section level3">
+<h3>Vector operation example</h3>
+<p>Consider example <code>table3</code> from the <code>tidyr</code> package.</p>
+<pre class="r"><code>table3
+#&gt; # A tibble: 6 × 3
+#&gt;       country  year              rate
+#&gt; *       &lt;chr&gt; &lt;int&gt;             &lt;chr&gt;
+#&gt; 1 Afghanistan  1999      745/19987071
+#&gt; 2 Afghanistan  2000     2666/20595360
+#&gt; 3      Brazil  1999   37737/172006362
+#&gt; 4      Brazil  2000   80488/174504898
+#&gt; 5       China  1999 212258/1272915272
+#&gt; 6       China  2000 213766/1280428583</code></pre>
+<p>It gives the rate of new tuberculosis cases for 3 countries in two years. But the <code>rate</code> variable needs to be split into the numerator and denominator and converted to numeric if we really want to work with this data, rather than just gaze upon it.</p>
+<p>Here’s one way to do that if you pull the <code>rate</code> variable out of the table via <code>$</code>.</p>
+<pre class="r"><code>table3$rate %&gt;%
+  str_split_fixed(pattern = &quot;/&quot;, n = 2)
+#&gt;      [,1]     [,2]        
+#&gt; [1,] &quot;745&quot;    &quot;19987071&quot;  
+#&gt; [2,] &quot;2666&quot;   &quot;20595360&quot;  
+#&gt; [3,] &quot;37737&quot;  &quot;172006362&quot; 
+#&gt; [4,] &quot;80488&quot;  &quot;174504898&quot; 
+#&gt; [5,] &quot;212258&quot; &quot;1272915272&quot;
+#&gt; [6,] &quot;213766&quot; &quot;1280428583&quot;</code></pre>
+<p>But now what? You’ve got a character matrix that is disassociated with <code>table3</code>. You’ve got more work to do before you can move on.</p>
+<p>When a variable needs lots of remedial work, this workflow is justified and unavoidable. But in many cases, you can fix variables “in place” and work inside the original tibble.</p>
+</div>
+<div id="tibble-operation-example" class="section level3">
+<h3>Tibble operation example</h3>
+<p>If we want to stay in the world of data frames or tibbles, we can get a much nicer result with <code>tidyr::separate()</code>.</p>
+<pre class="r"><code>table3 %&gt;% 
+  separate(rate, into = c(&quot;cases&quot;, &quot;population&quot;), convert = TRUE)
+#&gt; # A tibble: 6 × 4
+#&gt;       country  year  cases population
+#&gt; *       &lt;chr&gt; &lt;int&gt;  &lt;int&gt;      &lt;int&gt;
+#&gt; 1 Afghanistan  1999    745   19987071
+#&gt; 2 Afghanistan  2000   2666   20595360
+#&gt; 3      Brazil  1999  37737  172006362
+#&gt; 4      Brazil  2000  80488  174504898
+#&gt; 5       China  1999 212258 1272915272
+#&gt; 6       China  2000 213766 1280428583</code></pre>
+<p>This gives us a modified version of <code>table3</code> but with <code>rate</code> removed and replaced by proper integer variables with the numerator (<code>cases</code>) and the denominator (<code>population</code>).</p>
+<p>We are immediately ready to move on to further analysis or visualiation. When feasible, it is generally advisable to manipulate vectors inside the data frame where they live.</p>
+</div>
+<div id="workarounds-and-failure" class="section level3">
+<h3>Workarounds and failure</h3>
+<p>The above was a carefully chosen example, where the splitting functions existed in both settings. In general, you have more flexibility with operations for naked vectors than with vectors inside a tibble. Luckily there are many situations in which you can still manipulate a vector inside a tibble. Use <code>mutate()</code>!</p>
+<p>Let’s convert <code>country</code> from character to factor in <code>table1</code>, the tidy version of this example dataset:</p>
+<pre class="r"><code>table1 %&gt;% 
+  mutate(country = factor(country))
+#&gt; # A tibble: 6 × 4
+#&gt;       country  year  cases population
+#&gt;        &lt;fctr&gt; &lt;int&gt;  &lt;int&gt;      &lt;int&gt;
+#&gt; 1 Afghanistan  1999    745   19987071
+#&gt; 2 Afghanistan  2000   2666   20595360
+#&gt; 3      Brazil  1999  37737  172006362
+#&gt; 4      Brazil  2000  80488  174504898
+#&gt; 5       China  1999 212258 1272915272
+#&gt; 6       China  2000 213766 1280428583</code></pre>
+<p>The <code>mutate()</code> strategy even works when multiple vectors form the necessary input to create a single new vector. Going backwards, we could re-create the vexing <code>rate</code> variable “by hand” from <code>cases</code> and <code>population</code>:</p>
+<pre class="r"><code>table1 %&gt;% 
+  mutate(rate = paste(as.character(cases), as.character(population), sep = &quot;/&quot;)) %&gt;% 
+  select(-cases, -population)
+#&gt; # A tibble: 6 × 3
+#&gt;       country  year              rate
+#&gt;         &lt;chr&gt; &lt;int&gt;             &lt;chr&gt;
+#&gt; 1 Afghanistan  1999      745/19987071
+#&gt; 2 Afghanistan  2000     2666/20595360
+#&gt; 3      Brazil  1999   37737/172006362
+#&gt; 4      Brazil  2000   80488/174504898
+#&gt; 5       China  1999 212258/1272915272
+#&gt; 6       China  2000 213766/1280428583</code></pre>
+<p>Finally, if it’s easier for your development process, you can always pull a variable out, work on it, then put it back in. What if we were crazy and preferred to have the year given in Roman numerals?</p>
+<pre class="r"><code>## make a copy so I don&#39;t mess with table1
+tmp_df &lt;- table1
+## create working copy of the variable of interest
+(tmp_var &lt;- tmp_df$year)
+#&gt; [1] 1999 2000 1999 2000 1999 2000
+## do my thing ...please just pretend it&#39;s way more complicated ;)
+(tmp_var &lt;- as.roman(tmp_var))
+#&gt; [1] MCMXCIX MM      MCMXCIX MM      MCMXCIX MM
+## put it back into the tibble
+tmp_df$year &lt;- tmp_var
+## admire our work
+tmp_df
+#&gt; # A tibble: 6 × 4
+#&gt;       country        year  cases population
+#&gt;         &lt;chr&gt; &lt;S3: roman&gt;  &lt;int&gt;      &lt;int&gt;
+#&gt; 1 Afghanistan     MCMXCIX    745   19987071
+#&gt; 2 Afghanistan          MM   2666   20595360
+#&gt; 3      Brazil     MCMXCIX  37737  172006362
+#&gt; 4      Brazil          MM  80488  174504898
+#&gt; 5       China     MCMXCIX 212258 1272915272
+#&gt; 6       China          MM 213766 1280428583</code></pre>
+<p>Let’s confront one genuinely fiddly scenario: what if you want to convert one (or more) existing variables into two (or more) new variables? And you aren’t lucky enough to have the perfect function, like <code>tidyr::separate()</code>, available?</p>
+<p>Unfortunately, <code>mutate()</code> doesn’t do exactly what you’d want.</p>
+<pre class="r"><code>table3 %&gt;% 
+  mutate(rate = str_split(rate, pattern = &quot;/&quot;))
+#&gt; # A tibble: 6 × 3
+#&gt;       country  year      rate
+#&gt;         &lt;chr&gt; &lt;int&gt;    &lt;list&gt;
+#&gt; 1 Afghanistan  1999 &lt;chr [2]&gt;
+#&gt; 2 Afghanistan  2000 &lt;chr [2]&gt;
+#&gt; 3      Brazil  1999 &lt;chr [2]&gt;
+#&gt; 4      Brazil  2000 &lt;chr [2]&gt;
+#&gt; 5       China  1999 &lt;chr [2]&gt;
+#&gt; 6       China  2000 &lt;chr [2]&gt;</code></pre>
+<p>Here the character strings with <code>cases</code>, and <code>population</code> are being stored as character vectors of length two inside a list-column, which is awkward to unpack.</p>
+<p>You would be better off to apply <code>str_split()</code> outside the tibble, make that into a tibble, then column bind it back in.</p>
+<pre class="r"><code>new_vars &lt;- table3$rate %&gt;%
+  str_split_fixed(pattern = &quot;/&quot;, n = 2) %&gt;% 
+  as_tibble()
+colnames(new_vars) &lt;- c(&quot;cases&quot;, &quot;population&quot;)
+new_vars
+#&gt; # A tibble: 6 × 2
+#&gt;    cases population
+#&gt;    &lt;chr&gt;      &lt;chr&gt;
+#&gt; 1    745   19987071
+#&gt; 2   2666   20595360
+#&gt; 3  37737  172006362
+#&gt; 4  80488  174504898
+#&gt; 5 212258 1272915272
+#&gt; 6 213766 1280428583
+table3 %&gt;% 
+  select(-rate) %&gt;% 
+  bind_cols(new_vars)
+#&gt; # A tibble: 6 × 4
+#&gt;       country  year  cases population
+#&gt;         &lt;chr&gt; &lt;int&gt;  &lt;chr&gt;      &lt;chr&gt;
+#&gt; 1 Afghanistan  1999    745   19987071
+#&gt; 2 Afghanistan  2000   2666   20595360
+#&gt; 3      Brazil  1999  37737  172006362
+#&gt; 4      Brazil  2000  80488  174504898
+#&gt; 5       China  1999 212258 1272915272
+#&gt; 6       China  2000 213766 1280428583</code></pre>
+<p>One day, it’s likely this will be easier (multi-mutate?) But we’re not there yet.</p>
+</div>
+
+<div class="footer">
+This work is licensed under the  <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
+</div>
+
+
+
+</div>
+
+<script>
+
+// add bootstrap table styles to pandoc tables
+$(document).ready(function () {
+  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
+});
+
+
+</script>
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/block031_vector-tibble-relations.md b/block031_vector-tibble-relations.md
new file mode 100644
index 00000000..b54afc46
--- /dev/null
+++ b/block031_vector-tibble-relations.md
@@ -0,0 +1,214 @@
+# Vectors versus tibbles
+
+
+
+In STAT 545 and the Master of Data Science program, we teach data analysis starting with a clean data frame (or tibble), an excerpt of the [Gapminder](http://www.gapminder.org) data, from the [gapminder package](https://github.com/jennybc/gapminder). We study the tibble's extent and variable types. We practice filtering, selecting, arranging, summarizing, mutating, and visualizing.
+
+Then we gradually start to reveal the more complicated operations needed to produce such clean data, which requires working with various types of atomic vectors in R, such as character or factor, and even with other data structures altogether, such as matrices or lists.
+
+Two questions keep coming up:
+
+  * Why are you showing me how to do things to vectors two different ways: as naked vectors and as vectors inside a data frame?
+  * What's the general workflow for working on naked vectors vs vectors inside a data frame?
+
+### Load packages
+
+In our examples, we'll use various core tidyverse packages and stringr.
+
+
+```r
+library(tidyverse)
+#> Loading tidyverse: ggplot2
+#> Loading tidyverse: tibble
+#> Loading tidyverse: tidyr
+#> Loading tidyverse: readr
+#> Loading tidyverse: purrr
+#> Loading tidyverse: dplyr
+#> Conflicts with tidy packages ----------------------------------------------
+#> filter(): dplyr, stats
+#> lag():    dplyr, stats
+library(stringr)
+```
+
+### Vector operation example
+
+Consider example `table3` from the `tidyr` package.
+
+
+```r
+table3
+#> # A tibble: 6 × 3
+#>       country  year              rate
+#> *       <chr> <int>             <chr>
+#> 1 Afghanistan  1999      745/19987071
+#> 2 Afghanistan  2000     2666/20595360
+#> 3      Brazil  1999   37737/172006362
+#> 4      Brazil  2000   80488/174504898
+#> 5       China  1999 212258/1272915272
+#> 6       China  2000 213766/1280428583
+```
+
+It gives the rate of new tuberculosis cases for 3 countries in two years. But the `rate` variable needs to be split into the numerator and denominator and converted to numeric if we really want to work with this data, rather than just gaze upon it.
+
+Here's one way to do that if you pull the `rate` variable out of the table via `$`.
+
+
+```r
+table3$rate %>%
+  str_split_fixed(pattern = "/", n = 2)
+#>      [,1]     [,2]        
+#> [1,] "745"    "19987071"  
+#> [2,] "2666"   "20595360"  
+#> [3,] "37737"  "172006362" 
+#> [4,] "80488"  "174504898" 
+#> [5,] "212258" "1272915272"
+#> [6,] "213766" "1280428583"
+```
+
+But now what? You've got a character matrix that is disassociated with `table3`. You've got more work to do before you can move on.
+
+When a variable needs lots of remedial work, this workflow is justified and unavoidable. But in many cases, you can fix variables "in place" and work inside the original tibble.
+
+### Tibble operation example
+
+If we want to stay in the world of data frames or tibbles, we can get a much nicer result with `tidyr::separate()`.
+
+
+```r
+table3 %>% 
+  separate(rate, into = c("cases", "population"), convert = TRUE)
+#> # A tibble: 6 × 4
+#>       country  year  cases population
+#> *       <chr> <int>  <int>      <int>
+#> 1 Afghanistan  1999    745   19987071
+#> 2 Afghanistan  2000   2666   20595360
+#> 3      Brazil  1999  37737  172006362
+#> 4      Brazil  2000  80488  174504898
+#> 5       China  1999 212258 1272915272
+#> 6       China  2000 213766 1280428583
+```
+
+This gives us a modified version of `table3` but with `rate` removed and replaced by proper integer variables with the numerator (`cases`) and the denominator (`population`).
+
+We are immediately ready to move on to further analysis or visualiation. When feasible, it is generally advisable to manipulate vectors inside the data frame where they live.
+
+### Workarounds and failure
+
+The above was a carefully chosen example, where the splitting functions existed in both settings. In general, you have more flexibility with operations for naked vectors than with vectors inside a tibble. Luckily there are many situations in which you can still manipulate a vector inside a tibble. Use `mutate()`!
+
+Let's convert `country` from character to factor in `table1`, the tidy version of this example dataset:
+
+
+```r
+table1 %>% 
+  mutate(country = factor(country))
+#> # A tibble: 6 × 4
+#>       country  year  cases population
+#>        <fctr> <int>  <int>      <int>
+#> 1 Afghanistan  1999    745   19987071
+#> 2 Afghanistan  2000   2666   20595360
+#> 3      Brazil  1999  37737  172006362
+#> 4      Brazil  2000  80488  174504898
+#> 5       China  1999 212258 1272915272
+#> 6       China  2000 213766 1280428583
+```
+
+The `mutate()` strategy even works when multiple vectors form the necessary input to create a single new vector. Going backwards, we could re-create the vexing `rate` variable "by hand" from `cases` and `population`:
+
+
+```r
+table1 %>% 
+  mutate(rate = paste(as.character(cases), as.character(population), sep = "/")) %>% 
+  select(-cases, -population)
+#> # A tibble: 6 × 3
+#>       country  year              rate
+#>         <chr> <int>             <chr>
+#> 1 Afghanistan  1999      745/19987071
+#> 2 Afghanistan  2000     2666/20595360
+#> 3      Brazil  1999   37737/172006362
+#> 4      Brazil  2000   80488/174504898
+#> 5       China  1999 212258/1272915272
+#> 6       China  2000 213766/1280428583
+```
+
+Finally, if it's easier for your development process, you can always pull a variable out, work on it, then put it back in. What if we were crazy and preferred to have the year given in Roman numerals?
+
+
+```r
+## make a copy so I don't mess with table1
+tmp_df <- table1
+## create working copy of the variable of interest
+(tmp_var <- tmp_df$year)
+#> [1] 1999 2000 1999 2000 1999 2000
+## do my thing ...please just pretend it's way more complicated ;)
+(tmp_var <- as.roman(tmp_var))
+#> [1] MCMXCIX MM      MCMXCIX MM      MCMXCIX MM
+## put it back into the tibble
+tmp_df$year <- tmp_var
+## admire our work
+tmp_df
+#> # A tibble: 6 × 4
+#>       country        year  cases population
+#>         <chr> <S3: roman>  <int>      <int>
+#> 1 Afghanistan     MCMXCIX    745   19987071
+#> 2 Afghanistan          MM   2666   20595360
+#> 3      Brazil     MCMXCIX  37737  172006362
+#> 4      Brazil          MM  80488  174504898
+#> 5       China     MCMXCIX 212258 1272915272
+#> 6       China          MM 213766 1280428583
+```
+
+Let's confront one genuinely fiddly scenario: what if you want to convert one (or more) existing variables into two (or more) new variables? And you aren't lucky enough to have the perfect function, like `tidyr::separate()`, available?
+
+Unfortunately, `mutate()` doesn't do exactly what you'd want.
+
+
+```r
+table3 %>% 
+  mutate(rate = str_split(rate, pattern = "/"))
+#> # A tibble: 6 × 3
+#>       country  year      rate
+#>         <chr> <int>    <list>
+#> 1 Afghanistan  1999 <chr [2]>
+#> 2 Afghanistan  2000 <chr [2]>
+#> 3      Brazil  1999 <chr [2]>
+#> 4      Brazil  2000 <chr [2]>
+#> 5       China  1999 <chr [2]>
+#> 6       China  2000 <chr [2]>
+```
+
+Here the character strings with `cases`, and `population` are being stored as character vectors of length two inside a list-column, which is awkward to unpack.
+
+You would be better off to apply `str_split()` outside the tibble, make that into a tibble, then column bind it back in.
+
+
+```r
+new_vars <- table3$rate %>%
+  str_split_fixed(pattern = "/", n = 2) %>% 
+  as_tibble()
+colnames(new_vars) <- c("cases", "population")
+new_vars
+#> # A tibble: 6 × 2
+#>    cases population
+#>    <chr>      <chr>
+#> 1    745   19987071
+#> 2   2666   20595360
+#> 3  37737  172006362
+#> 4  80488  174504898
+#> 5 212258 1272915272
+#> 6 213766 1280428583
+table3 %>% 
+  select(-rate) %>% 
+  bind_cols(new_vars)
+#> # A tibble: 6 × 4
+#>       country  year  cases population
+#>         <chr> <int>  <chr>      <chr>
+#> 1 Afghanistan  1999    745   19987071
+#> 2 Afghanistan  2000   2666   20595360
+#> 3      Brazil  1999  37737  172006362
+#> 4      Brazil  2000  80488  174504898
+#> 5       China  1999 212258 1272915272
+#> 6       China  2000 213766 1280428583
+```
+
+One day, it's likely this will be easier (multi-mutate?) But we're not there yet.