Skip to content
This repository has been archived by the owner on Sep 18, 2019. It is now read-only.

Commit

Permalink
Browse files Browse the repository at this point in the history
save partial and unused data manipulation block
  • Loading branch information
jennybc committed Oct 25, 2014
1 parent 286516b commit af62ddb
Show file tree
Hide file tree
Showing 3 changed files with 393 additions and 0 deletions.
229 changes: 229 additions & 0 deletions block021_data-manipulation-capstone.html
@@ -0,0 +1,229 @@
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />



<title>Exercises to test and solidy your data manipulation skills</title>

<script src="libs/jquery-1.11.0/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link href="libs/bootstrap-2.3.2/css/united.min.css" rel="stylesheet" />
<link href="libs/bootstrap-2.3.2/css/bootstrap-responsive.min.css" rel="stylesheet" />
<script src="libs/bootstrap-2.3.2/js/bootstrap.min.js"></script>

<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
href="libs/highlight/default.css"
type="text/css" />
<script src="libs/highlight/highlight.js"></script>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
window.setTimeout(function() {
hljs.initHighlighting();
}, 0);
}
</script>


<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />

</head>

<body>

<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
</style>
<div class="container-fluid main-container">

<header>
<div class="nav">
<a class="nav-logo" href="index.html">
<img src="static/img/stat545-logo-s.png" width="70px" height="70px"/>
</a>
<ul>
<li class="home"><a href="index.html">Home</a></li>
<li class="faq"><a href="faq.html">FAQ</a></li>
<li class="syllabus"><a href="syllabus.html">Syllabus</a></li>
<li class="topics"><a href="topics.html">Topics</a></li>
<li class="people"><a href="people.html">People</a></li>
</ul>
</div>
</header>

<div id="header">
<h1 class="title">Exercises to test and solidy your data manipulation skills</h1>
</div>

<div id="TOC">
<ul>
<li><a href="#aggregate-or-summarize">Aggregate or summarize</a></li>
<li><a href="#cross-tabulate-with-holes">Cross-tabulate with holes</a></li>
</ul>
</div>

<p><em>NOTE: Not completed or used. It is a start on a set of data manipulation challenges, but I lost too much time tracking down a puzzle in the <code>spread()</code> example. It turned out to be a bug in <code>dplyr</code>. See <a href="https://github.com/hadley/tidyr/issues/32">this issue</a> or <a href="https://github.com/hadley/tidyr/issues/42">the one I opened and closed</a>.</em></p>
<pre class="r"><code>library(dplyr)
##
## Attaching package: &#39;dplyr&#39;
##
## The following object is masked from &#39;package:stats&#39;:
##
## filter
##
## The following objects are masked from &#39;package:base&#39;:
##
## intersect, setdiff, setequal, union
library(reshape2)
library(tidyr)
gdat &lt;- read.delim(&quot;gapminderDataFiveYear.tsv&quot;)</code></pre>
<div id="aggregate-or-summarize" class="section level3">
<h3>Aggregate or summarize</h3>
<p>From this input:</p>
<pre class="r"><code>(hdat &lt;- gdat %&gt;%
filter(country %in% c(&#39;France&#39;, &#39;Belgium&#39;, &#39;Nigeria&#39;, &#39;Japan&#39;),
year &gt; 1996) %&gt;%
select(country, year, continent, lifeExp) %&gt;%
filter( (country == &#39;Japan&#39;) |
(country == &#39;Belgium&#39; &amp; year == 2002) |
(country == &#39;France&#39; &amp; year &lt; 2005) |
(country == &#39;Nigeria&#39; &amp; year &gt; 2002)))
## country year continent lifeExp
## 1 Belgium 2002 Europe 78.320
## 2 France 1997 Europe 78.640
## 3 France 2002 Europe 79.590
## 4 Japan 1997 Asia 80.690
## 5 Japan 2002 Asia 82.000
## 6 Japan 2007 Asia 82.603
## 7 Nigeria 2007 Africa 46.859</code></pre>
<p>Make this output:</p>
<table>
<thead>
<tr class="header">
<th align="left">country</th>
<th align="left">continent</th>
<th align="right">nrows</th>
<th align="right">max_year</th>
<th align="right">min_lifeExp</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Belgium</td>
<td align="left">Europe</td>
<td align="right">1</td>
<td align="right">2002</td>
<td align="right">78.320</td>
</tr>
<tr class="even">
<td align="left">France</td>
<td align="left">Europe</td>
<td align="right">2</td>
<td align="right">2002</td>
<td align="right">78.640</td>
</tr>
<tr class="odd">
<td align="left">Japan</td>
<td align="left">Asia</td>
<td align="right">3</td>
<td align="right">2007</td>
<td align="right">80.690</td>
</tr>
<tr class="even">
<td align="left">Nigeria</td>
<td align="left">Africa</td>
<td align="right">1</td>
<td align="right">2007</td>
<td align="right">46.859</td>
</tr>
</tbody>
</table>
</div>
<div id="cross-tabulate-with-holes" class="section level3">
<h3>Cross-tabulate with holes</h3>
<p>From <code>hdat</code> (code to produce given above)</p>
<pre class="r"><code>hdat
## country year continent lifeExp
## 1 Belgium 2002 Europe 78.320
## 2 France 1997 Europe 78.640
## 3 France 2002 Europe 79.590
## 4 Japan 1997 Asia 80.690
## 5 Japan 2002 Asia 82.000
## 6 Japan 2007 Asia 82.603
## 7 Nigeria 2007 Africa 46.859</code></pre>
<p>Make this output (it should be a data.frame):</p>
<table>
<thead>
<tr class="header">
<th align="left">continent</th>
<th align="right">1997</th>
<th align="right">2002</th>
<th align="right">2007</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Africa</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">1</td>
</tr>
<tr class="even">
<td align="left">Asia</td>
<td align="right">1</td>
<td align="right">1</td>
<td align="right">1</td>
</tr>
<tr class="odd">
<td align="left">Europe</td>
<td align="right">1</td>
<td align="right">2</td>
<td align="right">NA</td>
</tr>
</tbody>
</table>
</div>

<div class="footer">
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>

</div>

<script>

// add bootstrap table styles to pandoc tables
$(document).ready(function () {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
});

</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>

</body>
</html>
87 changes: 87 additions & 0 deletions block021_data-manipulation-capstone.md
@@ -0,0 +1,87 @@
# Exercises to test and solidy your data manipulation skills

*NOTE: Not completed or used. It is a start on a set of data manipulation challenges, but I lost too much time tracking down a puzzle in the `spread()` example. It turned out to be a bug in `dplyr`. See [this issue](https://github.com/hadley/tidyr/issues/32) or [the one I opened and closed](https://github.com/hadley/tidyr/issues/42).*




```r
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
library(tidyr)
gdat <- read.delim("gapminderDataFiveYear.tsv")
```


### Aggregate or summarize

From this input:


```r
(hdat <- gdat %>%
filter(country %in% c('France', 'Belgium', 'Nigeria', 'Japan'),
year > 1996) %>%
select(country, year, continent, lifeExp) %>%
filter( (country == 'Japan') |
(country == 'Belgium' & year == 2002) |
(country == 'France' & year < 2005) |
(country == 'Nigeria' & year > 2002)))
## country year continent lifeExp
## 1 Belgium 2002 Europe 78.320
## 2 France 1997 Europe 78.640
## 3 France 2002 Europe 79.590
## 4 Japan 1997 Asia 80.690
## 5 Japan 2002 Asia 82.000
## 6 Japan 2007 Asia 82.603
## 7 Nigeria 2007 Africa 46.859
```

Make this output:




country continent nrows max_year min_lifeExp
-------- ---------- ------ --------- ------------
Belgium Europe 1 2002 78.320
France Europe 2 2002 78.640
Japan Asia 3 2007 80.690
Nigeria Africa 1 2007 46.859

### Cross-tabulate with holes

From `hdat` (code to produce given above)

```r
hdat
## country year continent lifeExp
## 1 Belgium 2002 Europe 78.320
## 2 France 1997 Europe 78.640
## 3 France 2002 Europe 79.590
## 4 Japan 1997 Asia 80.690
## 5 Japan 2002 Asia 82.000
## 6 Japan 2007 Asia 82.603
## 7 Nigeria 2007 Africa 46.859
```

Make this output (it should be a data.frame):




continent 1997 2002 2007
---------- ----- ----- -----
Africa NA NA 1
Asia 1 1 1
Europe 1 2 NA
77 changes: 77 additions & 0 deletions block021_data-manipulation-capstone.rmd
@@ -0,0 +1,77 @@
---
title: Exercises to test and solidy your data manipulation skills
output:
html_document:
toc: true
toc_depth: 4
---

*NOTE: Not completed or used. It is a start on a set of data manipulation challenges, but I lost too much time tracking down a puzzle in the `spread()` example. It turned out to be a bug in `dplyr`. See [this issue](https://github.com/hadley/tidyr/issues/32) or [the one I opened and closed](https://github.com/hadley/tidyr/issues/42).*

```{r setup, include = FALSE, cache = FALSE}
knitr::opts_chunk$set(error = TRUE, collapse = TRUE)
```

```{r}
library(dplyr)
library(reshape2)
library(tidyr)
gdat <- read.delim("gapminderDataFiveYear.tsv")
```


### Aggregate or summarize

From this input:

```{r}
(hdat <- gdat %>%
filter(country %in% c('France', 'Belgium', 'Nigeria', 'Japan'),
year > 1996) %>%
select(country, year, continent, lifeExp) %>%
filter( (country == 'Japan') |
(country == 'Belgium' & year == 2002) |
(country == 'France' & year < 2005) |
(country == 'Nigeria' & year > 2002)))
```

Make this output:

```{r include = FALSE}
idat <- hdat %>%
group_by(country, continent) %>%
summarize(nrows = n(),
max_year = max(year),
min_lifeExp = min(lifeExp))
```

```{r echo = FALSE}
knitr::kable(as.data.frame(idat))
```

### Cross-tabulate with holes

From `hdat` (code to produce given above)
```{r}
hdat
```

Make this output (it should be a data.frame):

```{r include = FALSE}
(jdat <- hdat %>%
group_by(continent, year) %>%
tally)
## tidyr::spread()
kdat <- jdat %>%
ungroup %>% # necessary temporarily; fix coming to dplyr!
spread(year, n)
## reshape2::dcast()
#dcast(jdat,continent ~ year, value.var = "n")
```

```{r echo = FALSE}
knitr::kable(as.data.frame(kdat))
```

0 comments on commit af62ddb

Please sign in to comment.