# R Workbook 3: Using R with SQL

In this notebook, we provide a short demonstrate of how you might use R and SQL together. Generally, you want to store and handle data using SQL, but do more sophisticated analyses in R. To utilize the benefits of both, we might start exploring and getting basic summaries using SQL (using something like DBeaver, for example), but move to R once you decide what subset you want to work with. This notebook provides the background of how that interface might happen.

We'll start by bringing in packages as usual. We use the typical `tidyverse` suite of packages, as well as the `DBI` package. The latter is used to connect to the database.

In [None]:
# Tidyverse includes dbplyr
library(tidyverse)

# For connection to SQL
library(odbc)

In [None]:
# Import the file with hints and solutions
source("r3_hints_and_solutions.txt")

## Creating a Connection

The first step we take is to create a connection to the database using `dbConnect`. This will allow you to connect to the server hosting your data. The parameters in this function will vary based on connection type, database type, and database location. However, this block of code will be used to establish a connection in the ADRF any time you access data stored there.  We assign the connection to the database to the variable `con`. We will reference this object every time we interact with or query the database.

In [None]:
con <- DBI::dbConnect(odbc::odbc(), 
                     Driver = "SQL Server", 
                     Server = "msssql01.c7bdq4o2yhxo.us-gov-west-1.rds.amazonaws.com",
                     Trusted_Connection = "True")

## Running Queries

Our notebooks:

- Run queries directly using the exact SQL code with the `dbGetQuery` function.


### Using `dbGetQuery`

You can run queries using the `dbGetQuery` function. This takes in as arguments the connection (which we created above as `con`), as well as a character string for the query. We are querying the database to pull out the table `lodes_ca_od_main_JT00_2015` from database and schema `ds_public_1` and schema `dbo` respectively. This particular query selects ten observations from the table we selected. 

In [None]:
ca_od <- dbGetQuery(con, 'SELECT TOP 10 * FROM ds_public_1.dbo.lodes_ca_od_main_JT00_2015')

In [None]:
head(ca_od)

## <span style="color:red">Checkpoint 1: Read in Other Data</span>

We've included IL LODES data to work with in the database. The table, located in the same database and schema as in the example above, is named `il_wac_S000_JT00_2015`.

See if you can load the table in similarly to how you loaded the California data above. In addition, try loading in 15 records instead of 10.

In [None]:
# checkpoint_1.hint()

In [None]:
# checkpoint_1.solution()