# R Workbook 3: Using R with SQL

In this notebook, we provide a short demonstrate of how you might use R and SQL together. Generally, you want to store and handle data using SQL, but do more sophisticated analyses in R. To utilize the benefits of both, we might start exploring and getting basic summaries using SQL (using something like DBeaver, for example), but move to R once you decide what subset you want to work with. This notebook provides the background of how that interface might happen.

We'll start by bringing in packages as usual. We use the typical `tidyverse` suite of packages, as well as the `DBI` and `RSQLite` packages. These last two are used to connect to the database. In our example, we will use an SQLite database, but for different databases, you might use a different package (such as `RPostgreSQL` for PostgreSQL databases).

In [8]:
# Tidyverse includes dbplyr
library(tidyverse)

# For connection to SQL
library(DBI)
library(RSQLite)

## Creating a Connection

The first step we take is to create a connection to the database. In our example, we've included an SQLite database called `lodes.db` containing some of the data, so we'll use that to connect to. In general, you may need to specify more than what we've included, depending on, for example, if there's certain permissions set on accessing the database.

In [9]:
con <- dbConnect(SQLite(), dbname = "lodes.db")

If we were to use a different flavor of SQL (such as PostgreSQL), then we would create our connection to the database slightly differently. For example, we could use the `RPostgreSQL` package instead of `RSQLite`. After you create the connection, though, everything afterwards is the same -- you can use that connection to write SQL code to do all the same things such as bringing in a table as a DataFrame.

## Running Queries

There are two main ways of running SQL queries from R that we'll go over. They are:

- Running queries directly using the exact SQL code with the `dbGetQuery` function.
- Using `dbplyr` from `tidyverse` to run queries with `dplyr` syntax.

We'll start with the first one.

### Using `dbGetQuery`

You can run queries using the `dbGetQuery` function. This takes in as arguments the connection (which we created above as `con`), as well as a character string for the query. 

In [12]:
test_table <- dbGetQuery(con, 'SELECT * FROM ca_wac_2015 LIMIT 10')
test_table

w_geocode,c000,ca01,ca02,ca03,ce01,ce02,ce03,cns01,cns02,⋯,cfa02,cfa03,cfa04,cfa05,cfs01,cfs02,cfs03,cfs04,cfs05,createdate
<chr>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<chr>
60014001001007,30,2,16,12,4,2,24,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001008,4,0,1,3,0,0,4,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001011,3,2,1,0,0,3,0,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001017,11,3,3,5,2,2,7,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001024,10,3,3,4,7,1,2,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001026,3,0,2,1,0,2,1,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001027,13,3,3,7,4,5,4,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001032,13,2,4,7,3,2,8,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001033,2,0,0,2,2,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919
60014001001034,1,0,0,1,1,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,20170919


### Using `dbplyr`

In [14]:
ca_wac <- tbl(con, 'ca_wac_2015')

In [18]:
head(ca_wac)

[38;5;246m# Source:   lazy query [?? x 53][39m
[38;5;246m# Database: sqlite 3.30.1 [/home/bkim/Documents/Projects/ada-intro-r/lodes.db][39m
  w_geocode  c000  ca01  ca02  ca03  ce01  ce02  ce03 cns01 cns02 cns03 cns04
  [3m[38;5;246m<chr>[39m[23m     [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m
[38;5;250m1[39m 06001400…    30     2    16    12     4     2    24     0     0     0     0
[38;5;250m2[39m 06001400…     4     0     1     3     0     0     4     0     0     0     0
[38;5;250m3[39m 06001400…     3     2     1     0     0     3     0     0     0     0     0
[38;5;250m4[39m 06001400…    11     3     3     5     2     2     7     0     0     0     0
[38;5;250m5[39m 060014