This repository has been archived by the owner on Sep 18, 2019. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
save partial and unused data manipulation block
- Loading branch information
jennybc
committed
Oct 25, 2014
1 parent
286516b
commit af62ddb
Showing
3 changed files
with
393 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,229 @@ | ||
<!DOCTYPE html> | ||
|
||
<html xmlns="http://www.w3.org/1999/xhtml"> | ||
|
||
<head> | ||
|
||
<meta charset="utf-8"> | ||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> | ||
<meta name="generator" content="pandoc" /> | ||
|
||
|
||
|
||
<title>Exercises to test and solidy your data manipulation skills</title> | ||
|
||
<script src="libs/jquery-1.11.0/jquery.min.js"></script> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> | ||
<link href="libs/bootstrap-2.3.2/css/united.min.css" rel="stylesheet" /> | ||
<link href="libs/bootstrap-2.3.2/css/bootstrap-responsive.min.css" rel="stylesheet" /> | ||
<script src="libs/bootstrap-2.3.2/js/bootstrap.min.js"></script> | ||
|
||
<style type="text/css">code{white-space: pre;}</style> | ||
<link rel="stylesheet" | ||
href="libs/highlight/default.css" | ||
type="text/css" /> | ||
<script src="libs/highlight/highlight.js"></script> | ||
<style type="text/css"> | ||
pre:not([class]) { | ||
background-color: white; | ||
} | ||
</style> | ||
<script type="text/javascript"> | ||
if (window.hljs && document.readyState && document.readyState === "complete") { | ||
window.setTimeout(function() { | ||
hljs.initHighlighting(); | ||
}, 0); | ||
} | ||
</script> | ||
|
||
|
||
<link rel="stylesheet" href="libs/local/nav.css" type="text/css" /> | ||
|
||
</head> | ||
|
||
<body> | ||
|
||
<style type = "text/css"> | ||
.main-container { | ||
max-width: 940px; | ||
margin-left: auto; | ||
margin-right: auto; | ||
} | ||
</style> | ||
<div class="container-fluid main-container"> | ||
|
||
<header> | ||
<div class="nav"> | ||
<a class="nav-logo" href="index.html"> | ||
<img src="static/img/stat545-logo-s.png" width="70px" height="70px"/> | ||
</a> | ||
<ul> | ||
<li class="home"><a href="index.html">Home</a></li> | ||
<li class="faq"><a href="faq.html">FAQ</a></li> | ||
<li class="syllabus"><a href="syllabus.html">Syllabus</a></li> | ||
<li class="topics"><a href="topics.html">Topics</a></li> | ||
<li class="people"><a href="people.html">People</a></li> | ||
</ul> | ||
</div> | ||
</header> | ||
|
||
<div id="header"> | ||
<h1 class="title">Exercises to test and solidy your data manipulation skills</h1> | ||
</div> | ||
|
||
<div id="TOC"> | ||
<ul> | ||
<li><a href="#aggregate-or-summarize">Aggregate or summarize</a></li> | ||
<li><a href="#cross-tabulate-with-holes">Cross-tabulate with holes</a></li> | ||
</ul> | ||
</div> | ||
|
||
<p><em>NOTE: Not completed or used. It is a start on a set of data manipulation challenges, but I lost too much time tracking down a puzzle in the <code>spread()</code> example. It turned out to be a bug in <code>dplyr</code>. See <a href="https://github.com/hadley/tidyr/issues/32">this issue</a> or <a href="https://github.com/hadley/tidyr/issues/42">the one I opened and closed</a>.</em></p> | ||
<pre class="r"><code>library(dplyr) | ||
## | ||
## Attaching package: 'dplyr' | ||
## | ||
## The following object is masked from 'package:stats': | ||
## | ||
## filter | ||
## | ||
## The following objects are masked from 'package:base': | ||
## | ||
## intersect, setdiff, setequal, union | ||
library(reshape2) | ||
library(tidyr) | ||
gdat <- read.delim("gapminderDataFiveYear.tsv")</code></pre> | ||
<div id="aggregate-or-summarize" class="section level3"> | ||
<h3>Aggregate or summarize</h3> | ||
<p>From this input:</p> | ||
<pre class="r"><code>(hdat <- gdat %>% | ||
filter(country %in% c('France', 'Belgium', 'Nigeria', 'Japan'), | ||
year > 1996) %>% | ||
select(country, year, continent, lifeExp) %>% | ||
filter( (country == 'Japan') | | ||
(country == 'Belgium' & year == 2002) | | ||
(country == 'France' & year < 2005) | | ||
(country == 'Nigeria' & year > 2002))) | ||
## country year continent lifeExp | ||
## 1 Belgium 2002 Europe 78.320 | ||
## 2 France 1997 Europe 78.640 | ||
## 3 France 2002 Europe 79.590 | ||
## 4 Japan 1997 Asia 80.690 | ||
## 5 Japan 2002 Asia 82.000 | ||
## 6 Japan 2007 Asia 82.603 | ||
## 7 Nigeria 2007 Africa 46.859</code></pre> | ||
<p>Make this output:</p> | ||
<table> | ||
<thead> | ||
<tr class="header"> | ||
<th align="left">country</th> | ||
<th align="left">continent</th> | ||
<th align="right">nrows</th> | ||
<th align="right">max_year</th> | ||
<th align="right">min_lifeExp</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr class="odd"> | ||
<td align="left">Belgium</td> | ||
<td align="left">Europe</td> | ||
<td align="right">1</td> | ||
<td align="right">2002</td> | ||
<td align="right">78.320</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">France</td> | ||
<td align="left">Europe</td> | ||
<td align="right">2</td> | ||
<td align="right">2002</td> | ||
<td align="right">78.640</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">Japan</td> | ||
<td align="left">Asia</td> | ||
<td align="right">3</td> | ||
<td align="right">2007</td> | ||
<td align="right">80.690</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">Nigeria</td> | ||
<td align="left">Africa</td> | ||
<td align="right">1</td> | ||
<td align="right">2007</td> | ||
<td align="right">46.859</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
</div> | ||
<div id="cross-tabulate-with-holes" class="section level3"> | ||
<h3>Cross-tabulate with holes</h3> | ||
<p>From <code>hdat</code> (code to produce given above)</p> | ||
<pre class="r"><code>hdat | ||
## country year continent lifeExp | ||
## 1 Belgium 2002 Europe 78.320 | ||
## 2 France 1997 Europe 78.640 | ||
## 3 France 2002 Europe 79.590 | ||
## 4 Japan 1997 Asia 80.690 | ||
## 5 Japan 2002 Asia 82.000 | ||
## 6 Japan 2007 Asia 82.603 | ||
## 7 Nigeria 2007 Africa 46.859</code></pre> | ||
<p>Make this output (it should be a data.frame):</p> | ||
<table> | ||
<thead> | ||
<tr class="header"> | ||
<th align="left">continent</th> | ||
<th align="right">1997</th> | ||
<th align="right">2002</th> | ||
<th align="right">2007</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr class="odd"> | ||
<td align="left">Africa</td> | ||
<td align="right">NA</td> | ||
<td align="right">NA</td> | ||
<td align="right">1</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">Asia</td> | ||
<td align="right">1</td> | ||
<td align="right">1</td> | ||
<td align="right">1</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">Europe</td> | ||
<td align="right">1</td> | ||
<td align="right">2</td> | ||
<td align="right">NA</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
</div> | ||
|
||
<div class="footer"> | ||
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>. | ||
</div> | ||
|
||
</div> | ||
|
||
<script> | ||
|
||
// add bootstrap table styles to pandoc tables | ||
$(document).ready(function () { | ||
$('tr.header').parent('thead').parent('table').addClass('table table-condensed'); | ||
}); | ||
|
||
</script> | ||
|
||
<!-- dynamically load mathjax for compatibility with self-contained --> | ||
<script> | ||
(function () { | ||
var script = document.createElement("script"); | ||
script.type = "text/javascript"; | ||
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; | ||
document.getElementsByTagName("head")[0].appendChild(script); | ||
})(); | ||
</script> | ||
|
||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Exercises to test and solidy your data manipulation skills | ||
|
||
*NOTE: Not completed or used. It is a start on a set of data manipulation challenges, but I lost too much time tracking down a puzzle in the `spread()` example. It turned out to be a bug in `dplyr`. See [this issue](https://github.com/hadley/tidyr/issues/32) or [the one I opened and closed](https://github.com/hadley/tidyr/issues/42).* | ||
|
||
|
||
|
||
|
||
```r | ||
library(dplyr) | ||
## | ||
## Attaching package: 'dplyr' | ||
## | ||
## The following object is masked from 'package:stats': | ||
## | ||
## filter | ||
## | ||
## The following objects are masked from 'package:base': | ||
## | ||
## intersect, setdiff, setequal, union | ||
library(reshape2) | ||
library(tidyr) | ||
gdat <- read.delim("gapminderDataFiveYear.tsv") | ||
``` | ||
|
||
|
||
### Aggregate or summarize | ||
|
||
From this input: | ||
|
||
|
||
```r | ||
(hdat <- gdat %>% | ||
filter(country %in% c('France', 'Belgium', 'Nigeria', 'Japan'), | ||
year > 1996) %>% | ||
select(country, year, continent, lifeExp) %>% | ||
filter( (country == 'Japan') | | ||
(country == 'Belgium' & year == 2002) | | ||
(country == 'France' & year < 2005) | | ||
(country == 'Nigeria' & year > 2002))) | ||
## country year continent lifeExp | ||
## 1 Belgium 2002 Europe 78.320 | ||
## 2 France 1997 Europe 78.640 | ||
## 3 France 2002 Europe 79.590 | ||
## 4 Japan 1997 Asia 80.690 | ||
## 5 Japan 2002 Asia 82.000 | ||
## 6 Japan 2007 Asia 82.603 | ||
## 7 Nigeria 2007 Africa 46.859 | ||
``` | ||
|
||
Make this output: | ||
|
||
|
||
|
||
|
||
country continent nrows max_year min_lifeExp | ||
-------- ---------- ------ --------- ------------ | ||
Belgium Europe 1 2002 78.320 | ||
France Europe 2 2002 78.640 | ||
Japan Asia 3 2007 80.690 | ||
Nigeria Africa 1 2007 46.859 | ||
|
||
### Cross-tabulate with holes | ||
|
||
From `hdat` (code to produce given above) | ||
|
||
```r | ||
hdat | ||
## country year continent lifeExp | ||
## 1 Belgium 2002 Europe 78.320 | ||
## 2 France 1997 Europe 78.640 | ||
## 3 France 2002 Europe 79.590 | ||
## 4 Japan 1997 Asia 80.690 | ||
## 5 Japan 2002 Asia 82.000 | ||
## 6 Japan 2007 Asia 82.603 | ||
## 7 Nigeria 2007 Africa 46.859 | ||
``` | ||
|
||
Make this output (it should be a data.frame): | ||
|
||
|
||
|
||
|
||
continent 1997 2002 2007 | ||
---------- ----- ----- ----- | ||
Africa NA NA 1 | ||
Asia 1 1 1 | ||
Europe 1 2 NA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
--- | ||
title: Exercises to test and solidy your data manipulation skills | ||
output: | ||
html_document: | ||
toc: true | ||
toc_depth: 4 | ||
--- | ||
|
||
*NOTE: Not completed or used. It is a start on a set of data manipulation challenges, but I lost too much time tracking down a puzzle in the `spread()` example. It turned out to be a bug in `dplyr`. See [this issue](https://github.com/hadley/tidyr/issues/32) or [the one I opened and closed](https://github.com/hadley/tidyr/issues/42).* | ||
|
||
```{r setup, include = FALSE, cache = FALSE} | ||
knitr::opts_chunk$set(error = TRUE, collapse = TRUE) | ||
``` | ||
|
||
```{r} | ||
library(dplyr) | ||
library(reshape2) | ||
library(tidyr) | ||
gdat <- read.delim("gapminderDataFiveYear.tsv") | ||
``` | ||
|
||
|
||
### Aggregate or summarize | ||
|
||
From this input: | ||
|
||
```{r} | ||
(hdat <- gdat %>% | ||
filter(country %in% c('France', 'Belgium', 'Nigeria', 'Japan'), | ||
year > 1996) %>% | ||
select(country, year, continent, lifeExp) %>% | ||
filter( (country == 'Japan') | | ||
(country == 'Belgium' & year == 2002) | | ||
(country == 'France' & year < 2005) | | ||
(country == 'Nigeria' & year > 2002))) | ||
``` | ||
|
||
Make this output: | ||
|
||
```{r include = FALSE} | ||
idat <- hdat %>% | ||
group_by(country, continent) %>% | ||
summarize(nrows = n(), | ||
max_year = max(year), | ||
min_lifeExp = min(lifeExp)) | ||
``` | ||
|
||
```{r echo = FALSE} | ||
knitr::kable(as.data.frame(idat)) | ||
``` | ||
|
||
### Cross-tabulate with holes | ||
|
||
From `hdat` (code to produce given above) | ||
```{r} | ||
hdat | ||
``` | ||
|
||
Make this output (it should be a data.frame): | ||
|
||
```{r include = FALSE} | ||
(jdat <- hdat %>% | ||
group_by(continent, year) %>% | ||
tally) | ||
## tidyr::spread() | ||
kdat <- jdat %>% | ||
ungroup %>% # necessary temporarily; fix coming to dplyr! | ||
spread(year, n) | ||
## reshape2::dcast() | ||
#dcast(jdat,continent ~ year, value.var = "n") | ||
``` | ||
|
||
```{r echo = FALSE} | ||
knitr::kable(as.data.frame(kdat)) | ||
``` |