Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
144 lines (82 sloc) 4.1 KB
---
title : What is data?
subtitle :
author : Jeffrey Leek
job : Johns Hopkins Bloomberg School of Public Health
logo : bloomberg_shield.png
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : tomorrow #
url:
lib: ../../libraries
assets: ../../assets
widgets : [mathjax] # {mathjax, quiz, bootstrap}
mode : selfcontained # {standalone, draft}
---
```{r setup, cache = F, echo = F, message = F, warning = F, tidy = F}
# make this an external chunk that can be included in any file
options(width = 70)
opts_chunk$set(message = F, error = F, warning = F, echo = F, comment = NA, fig.align = 'center', dpi = 100, tidy = F, cache = T, cache.path = '.cache/', fig.path = 'fig/')
options(xtable.type = 'html')
knit_hooks$set(inline = function(x) {
if(is.numeric(x)) {
round(x, getOption('digits'))
} else {
paste(as.character(x), collapse = ', ')
}
})
knit_hooks$set(plot = knitr:::hook_plot_html)
```
## Definition of data
<q>Data are values of qualitative or quantitative variables, belonging to a set of items.</q>
[http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data)
---
## Definition of data
<q>Data are values of qualitative or quantitative variables, belonging to a <redtext>set of items</redtext>.</q>
[http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data)
__Set of items__: Sometimes called the population; the set of objects you are interested in
---
## Definition of data
<q>Data are values of qualitative or quantitative <redtext>variables</redtext>, belonging to a set of items.</q>
[http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data)
__Variables__: A measurement or characteristic of an item.
---
## Definition of data
<q>Data are values of <redtext>qualitative</redtext> or <redtext>quantitative</redtext> variables, belonging to a set of items.</q>
[http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data)
__Qualitative__: Country of origin, sex, treatment
__Quantitative__: Height, weight, blood pressure
---
## What do data look like?
<img class=center src="../../assets/img/01_DataScientistToolbox/fastq.png" height=400 />
[http://brianknaus.com/software/srtoolbox/s_4_1_sequence80.txt](http://brianknaus.com/software/srtoolbox/s_4_1_sequence80.txt)
---
## What do data look like?
<img class=center src="../../assets/img/01_DataScientistToolbox/twitter.png" height=400 />
[https://dev.twitter.com/docs/api/1/get/blocks/blocking](https://dev.twitter.com/docs/api/1/get/blocks/blocking)
---
## What do data look like?
<img class=center src="../../assets/img/01_DataScientistToolbox/medicalrecord.png" height=400 />
[http://blue-button.github.com/challenge/](http://blue-button.github.com/challenge/)
---
## What do data look like?
<img class=center src="../../assets/img/01_DataScientistToolbox/deepcat.jpg" height=400 />
[http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all&_r=0](http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all&_r=0)
---
## What do data look like?
<img class=center src="../../assets/img/01_DataScientistToolbox/darwintunes.png" height=400 />
[http://www.pnas.org/content/109/30/12081.full](http://www.pnas.org/content/109/30/12081.full)
[https://soundcloud.com/uncoolbob/sets/darwintunes](https://soundcloud.com/uncoolbob/sets/darwintunes)
---
## What do data look like?
<img class=center src="../../assets/img/01_DataScientistToolbox/datagov.png" height=400 />
[http://www.data.gov/](http://www.data.gov/)
-------
## What do data look like? Rarely
<img class=center src="../../assets/img/01_DataScientistToolbox/excel.png" height=400 />
-------
## The data is the second most important thing
* The most important thing in data science is the question
* The second most important is the data
* Often the data will limit or enable the questions
* But having data can't save you if you don't have a question
You can’t perform that action at this time.