slides2.Rpres

<style>

.reveal .slides > sectionx {
    top: -70%;
}

.reveal pre code.r {background-color: #ccF}

.section .reveal li {color:white}
.section .reveal em {font-weight: bold; font-style: "none"}

</style>


Text Analysis in R
========================================================
author: Wouter van Atteveldt
date: Session 2: Transforming Data & APIs

Course Overview
===
type:section 

Thursday: Introduction to R
  + Organizing data
  + *Transforming data*
  + Accessing APIs from R

Friday: Corpus Analysis & Topic Modeling

Saturday: Machine Learning & Sentiment Analysis

Sunday: Semantic Networks & Grammatical Analysis


Transforming data
====
type:section

Combining data

Reshaping data

Combining data
=====

```{r, echo=F}
df = data.frame(id=1:3, age=c(14, 18, 24), 
          name=c("Mary", "John", "Luke"))
```

```{r}
cbind(df, country=c("nl", "uk", "uk"))
rbind(df, c(id=1, age=2, name="Mary"))
```

Merging data
===

```{r}
countries = data.frame(id=1:2, country=c("nl", "uk"))
merge(df, countries)
merge(df, countries, all=T)
```

Merging data
===

```{r, eval=F}
merge(data1, data2)
merge(data1, data2, by="id")
merge(data1, data2, by.x="id", by.y="ID")
merge(data1, data2, by="id", all=T)
merge(data1, data2, by="id", all.x=T)
```

Reshaping data
===

+ `reshape2` package:
  + `melt`: wide to long
  + `dcast`: long to wide (pivot table) 

Melting data
===

```{r}
wide = data.frame(id=1:3, 
  group=c("a","a","b"), 
  width=c(100, 110, 120), 
  height=c(50, 100, 150))
wide
```

Melting data
===

```{r}
library(reshape2)
long = melt(wide, id.vars=c("id", "group"))
long
```


Casting data
===

```{r}
dcast(long, id + group ~ variable, value.var="value")
```

Casting data: aggregation
===

```{r}
dcast(long, group ~ variable, value.var = "value", fun.aggregate = max)
dcast(long, id ~., value.var = "value", fun.aggregate = mean)
```

Aggregation with `aggregate`
===

```{r}
aggregate(long["value"], long["group"], max)
```

`aggregate` vs `dcast`
===

Aggregate
+ One aggregation function
+ Multiple value columns
+ Groups go in rows (long format)
+ Specify with column subsets

Cast
+ One aggregation function
+ One value column
+ Groups go in rows or columns
+ Specify with formula (`rows ~ columns`)


Simple statistics
===

Vector properties

```{r, eval=F}
mean(x)
sd(x)
sum(x)
```

Basic tests

```{r, eval=F}
t.test(wide, width ~ group)
t.test(wide$width, wide$height, paired=T)
cor.test(wide$width, wide$height)
m = lm(long, width ~ group + height)
summary(m)
```


Interactive 2a
====
type: section

Transforming data in R


Course Overview
===
type:section 

Thursday: Introduction to R
  + Organizing data
  + Transforming data
  + *Accessing APIs from R*

Friday: Corpus Analysis & Topic Modeling

Saturday: Machine Learning & Sentiment Analysis

Sunday: Semantic Networks & Grammatical Analysis


What is an API?
===

+ Application Programming Interface
+ Computer-friendly web page
  + Standardized requests
  + Structured response
    + json/ csv
+ Access directly (HTTP call)
+ Client library for popular APIs

Demo: APIs and HTTP requests
===
type: section
    
Package twitteR
===

```{r, eval=F}
install_github("geoffjentry/twitteR") 
setup_twitter_oauth(...)
tweets = searchTwitteR("#Trump2016", resultType="recent", n = 10)
tweets = plyr::ldply(tweets, as.data.frame)
```

Package Rfacebook
===

```{r, eval=F}
install_github("pablobarbera/Rfacebook", subdir="Rfacebook")
fb_token = fbOAuth(fb_app_id, fb_app_secret)
p = getPage(page="nytimes", token=fb_token)
post = getPost(p$id[1], token=fb_token)
```

Package rtimes
====

```{r, eval=F}
install.packages("rtimes")
options(nytimes_as_key = nyt_api_key)

res = as_search(q="trump", 
  begin_date = "20160101", 
  end_date = '20160501')

arts = plyr::ldply(res$data, 
  function(x) c(headline=x$headline$main, 
                date=x$pub_date))
```

APIs and rate limits
===

+ Most APIs have access limits
+ Log on with key or token
+ Response size (page) limited to n results
+ Requests limited to n per hour/day
+ Some clients deal with this, some don't
+ See API and client documentation


Directly accessing APIs
===

+ Make HTTP requests directly from R
  + package `httr` (or `RCurl`)
+ Can access all web data source
+ Need to figure out authentication, structure, etc


Directly accessing APIs
===

```{r, eval=F}
domain = 'https://api.nytimes.com'
path = 'svc/search/v2/articlesearch.json'
url = paste(domain, path, url, sep='/')
query = list(`api-key`=key, q="clinton")
r = httr::GET(url, query=query)
status_code(r)
result = content(r)
result$response$docs[[1]]$headline
```


Interactive 2b
====
type: section

Accessing APIs

Hands-on 2
====
type: section

Break

Hand-outs:
+ Transforming data
+ Accesing APIs
+ Retrieve your own data 
+ Bonus: modeling and visualizing

Mini-project:
Retrieve data about a topic of your interest

Course Overview
===
type:section 

Thursday: Introduction to R
  + Organizing data
  + Transforming data
  + Accessing APIs from R

Friday: Corpus Analysis & Topic Modeling

Saturday: Machine Learning & Sentiment Analysis

Sunday: Semantic Networks & Grammatical Analysis