--- title : What is data? subtitle : author : Jeffrey Leek job : Johns Hopkins Bloomberg School of Public Health logo : bloomberg_shield.png framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # url: lib: ../../libraries assets: ../../assets widgets : [mathjax] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} --- {r setup, cache = F, echo = F, message = F, warning = F, tidy = F} # make this an external chunk that can be included in any file options(width = 70) opts_chunkset(message = F, error = F, warning = F, echo = F, comment = NA, fig.align = 'center', dpi = 100, tidy = F, cache = T, cache.path = '.cache/', fig.path = 'fig/') options(xtable.type = 'html') knit_hooksset(inline = function(x) { if(is.numeric(x)) { round(x, getOption('digits')) } else { paste(as.character(x), collapse = ', ') } }) knit_hooks\$set(plot = knitr:::hook_plot_html)  ## Definition of data Data are values of qualitative or quantitative variables, belonging to a set of items. [http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data) --- ## Definition of data Data are values of qualitative or quantitative variables, belonging to a set of items. [http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data) __Set of items__: Sometimes called the population; the set of objects you are interested in --- ## Definition of data Data are values of qualitative or quantitative variables, belonging to a set of items. [http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data) __Variables__: A measurement or characteristic of an item. --- ## Definition of data Data are values of qualitative or quantitative variables, belonging to a set of items. [http://en.wikipedia.org/wiki/Data](http://en.wikipedia.org/wiki/Data) __Qualitative__: Country of origin, sex, treatment __Quantitative__: Height, weight, blood pressure --- ## What do data look like? [http://brianknaus.com/software/srtoolbox/s_4_1_sequence80.txt](http://brianknaus.com/software/srtoolbox/s_4_1_sequence80.txt) --- ## What do data look like? [https://dev.twitter.com/docs/api/1/get/blocks/blocking](https://dev.twitter.com/docs/api/1/get/blocks/blocking) --- ## What do data look like? [http://blue-button.github.com/challenge/](http://blue-button.github.com/challenge/) --- ## What do data look like? [http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all&_r=0](http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all&_r=0) --- ## What do data look like? [http://www.pnas.org/content/109/30/12081.full](http://www.pnas.org/content/109/30/12081.full) [https://soundcloud.com/uncoolbob/sets/darwintunes](https://soundcloud.com/uncoolbob/sets/darwintunes) --- ## What do data look like? [http://www.data.gov/](http://www.data.gov/) ------- ## What do data look like? Rarely ------- ## The data is the second most important thing * The most important thing in data science is the question * The second most important is the data * Often the data will limit or enable the questions * But having data can't save you if you don't have a question