Skip to content

jbryer/sqlutils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The sqlutils Package

The sqlutils package provides a set of utility functions to help manage a library of structured query language (SQL) files. The package can be installed from Github using the devtools package.

devtools::install_github('jbryer/sqlutils')

The sqlutils package provides functions to document, cache, and execute SQL queries. The location of the SQL files is determined by the sqlPaths() function. This function behaves in a manner consistent with the .libPaths() function. By default, a single path will be defined being the data directory where the sqlutils package is installed.

> sqlPaths()
[1] "/Users/jbryer/R/sqlutils/data"

Additional search paths can be added using sqlPaths('/Path/To/SQL/Files'). By convention, sqlutils will work with any plain text files with a .sql file extention in any of the directories returned from sqlPaths(). In the case of multiple files with the same name, first one wins.

In addition to working with a library (directory) of SQL files, sqlutils recognizes roxygen2 style documentation. The StudentsInRange script (located in the data directory of the installed package), exemplifies how to create a SQL query with two parameters as well as how to define those parameters and provide default values. Default values are used when the user fails to supply values within the execQuery or cacheQuery functions (described in detail bellow). The available documenations tages are:

  • @param paramName - This provides a description of the parameter.
  • @default paramName - This defines the default value. This can be any valid R statement.
  • @return columnName - Provides documentation for any returned columns.

The contents of the StudentsInRange query follows:

#' Students enrolled within the given date range.
#' 
#' @param startDate the start of the date range to return students.
#' @default startDate format(Sys.Date(), '%Y-01-01')
#' @param endDate the end of the date range to return students.
#' @default endDate format(Sys.Date(), '%Y-%m-%d')
#' @return CreatedDate the date the row was added to the warehouse data.
#' @return StudentId the student id.
SELECT * 
FROM students 
WHERE CreatedDate >= ':startDate:' AND CreatedDate <= ':endDate:'

It should be noted that parameters are replaced just before executing the query and must be contained with a pair of colons (:) and be valid R object names (i.e. not start with a number, contain spaces, or special characters).

We can now retrieve the documentation from within R using the sqldoc command.

> sqldoc('StudentsInRange')
Students enrolled within the given date range.
Parameters:
     param                                            desc                        default default.val
 startDate the start of the date range to return students. format(Sys.Date(), '%Y-01-01')  2012-01-01
   endDate   the end of the date range to return students. format(Sys.Date(), '%Y-%m-%d')  2012-11-19
Returns (note that this list may not be complete):
    variable                                              desc
 CreatedDate the date the row was added to the warehouse data.
   StudentId                                   the student id.

The required parameters can also be retrieved using the getParameters function.

> getParameters('StudentsInRange')
[1] "startDate" "endDate"

In the case there are no parameters, an empty character vector is returned.

> getParameters('StudentSummary')
character(0)

A list of all available queries is returned using the getQueries() function.

> getQueries()
 [1] "StudentsInRange" "StudentSummary" 

There are two functions available to execute queries, execQuery and cacheQuery. The former will send the SQL query to the database upon every execution. The latter however, maintains a local cached version (as a CSV or Rda file) of the resulting data frame. Specifically, the function creates a unique filename based upon the query name and parameters (see getCacheFilename function; this can also be overwritten using the filename parameter). If that file exists in specified directory (the current working directory by default), then it reads the file from disk and returns that. If the file does not exist, then execQuery is called, the result data frame saved to disk, and then the data frame is returned. The following complete example loads the students data frame from the retention package, saves it to a SQLite database, and executes the two included queries.

> require(RSQLite)
> sqlfile <- paste(system.file(package='sqlutils'), '/db/students.db', sep='')
> m <- dbDriver("SQLite")
> conn <- dbConnect(m, dbname=sqlfile)
> q1 <- execQuery('StudentSummary', connection=conn)
> head(q1)
  CreatedDate count
1  2002-07-15  8365
2  2002-08-15  8251
3  2002-09-15  8259
4  2002-10-15  8258
5  2002-11-15  8151
6  2002-12-15  8415

Supported databases

The sqlutils package supports database access using the RODBC, RSQLite, RPostgreSQL, and RMySQL packages using an S3 generic function call called sqlexec based upon the class of the connection parameter. For example, create a new database connection for connections of class foo, the following provides the skeleton of the function to implement:

sqlexec.foo <- function(connection, sql, ...) {
	#Database implementation here.
	#The ... will be passed through from the execQuery call. 
}

About

Utilies for managing libraries of SQL queries.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages