Skip to content
Permalink
Browse files

initial release

  • Loading branch information...
daroczig committed Apr 3, 2012
0 parents commit e8d7f5d7b44bc5c346cf2837de1b322791ea8e32
@@ -0,0 +1,20 @@
Package: sandboxR
Maintainer: Gergely Daróczi <gergely@snowl.net>
Title: Filtering "malicious" calls
Type: Package
Encoding: UTF-8
Description: This POC package tries to filter "malicious" calls in R
expressions based on a blacklist to let shared R instances be safe from
file and system calls.
Author: Gergely Daróczi <gergely@snowl.net>
Version: 0.1
Date: 2012-04-02
URL: https://github.com/daroczig/sandboxR
BugReports: https://github.com/daroczig/sandboxR/issues
License: AGPL-3
Imports:
parser
Collate:
'masked.functions.R'
'sandbox.R'
'sandboxR.R'
27 FAQ.md
@@ -0,0 +1,27 @@
**This file is heavily under development (just like other parts of the package)!**

## Why do not you allow `foo` function from package `bar`?

**In short**: as I do not need it :)

**Longer answer**: I do think that users should ever touch filesystem on a shared, web-based environment as those functions should be addressed by the hosting application. E.g. if a user would create a plot in a script then that should be saved to an image on the disk automatically without calling `png` or other R functions in the user-driven R session. Similarly data upload should be done in the web application, not in the user-driven R session. Also users should not use complex functions and package-like environments with S4 classes (not that I hate S4 I promise!), as those should be compiled to a package and submitted to CRAN. After that the package could be whitelisted :)

**Workaround**: the package uses a static list of blacklisted functions, but feel free to use your own list in your custom environment.

## Filtering resource-hungry calls

There are some function calls which are "malicious", but are not addressed by *sandboxR*. These are usually tries to take a lot of resources for nothing, so not modifying files, but "just" wasting server resources.

A great example of @Jeroen (http://stackoverflow.com/a/9145930/564164):

```
library(multicore);
forkbomb <- function(){
repeat{
parallel(forkbomb());
}
}
forkbomb();
```

These problems would not ever be addressed by *sandboxR* as e.g. Apparmor can do a handy job here with ease.
@@ -0,0 +1,6 @@
export(commands.blacklist)
export(paste0.masked)
export(paste.masked)
export(sandbox)
export(sprintf.masked)
importFrom(parser,parser)
4 NEWS
@@ -0,0 +1,4 @@
sandboxR 0.1 (2012-04-02)
----------------------------------------------------------------

Initial release on Github.
@@ -0,0 +1,61 @@
#' Masked paste
#'
#' Checks for forbidden function calls in constructed character vector before returning result.
#' @param ... see \code{paste}
#' @param sep see \code{paste}
#' @param collapse see \code{paste}
#' @seealso commands.blacklist
#' @examples \dontrun{
#' paste.masked('sys', 'tem', '(', sep = '')
#' paste.masked('xsx sax s system( dasf asf as url(')
#' paste.masked(c(letters[c(19, 25, 19, 20, 5, 13)], '('), collapse = "")
#' }
#' @export
paste.masked <- function(..., sep = '', collapse = NULL) {

res <- base::paste(..., sep = sep, collapse = collapse)

blacklist <- as.character(unlist(commands.blacklist()))
blacklist.found <- sapply(sprintf('%s[ \t]*\\(', blacklist), grepl, res)
blacklist.found <- which(blacklist.found == TRUE)

if (length(blacklist.found) > 0)
stop(sprintf('Forbidden function%s name build: %s.', ifelse(length(blacklist.found) == 1, '\'s', 's\''), paste0(blacklist[blacklist.found], collapse = ', ')))

return(res)

}


#' Masked paste0
#' @param ... see \code{paste0}
#' @param collapse see \code{paste0}
#' @examples \dontrun{
#' paste0.masked('sys', 'tem', '(')
#' }
#' @export
paste0.masked <- function(..., collapse = NULL) {

sandboxR::paste.masked(..., sep = '', collapse = collapse)

}


#' Masked sprintf
#' @param fmt see \code{sprintf}
#' @param ... see \code{sprintf}
#' @export
sprintf.masked <- function(fmt, ...) {

res <- base::sprintf(fmt, ...)

blacklist <- as.character(unlist(commands.blacklist()))
blacklist.found <- sapply(sprintf('%s[ \t]*\\(', blacklist), grepl, res)
blacklist.found <- which(blacklist.found == TRUE)

if (length(blacklist.found) > 0)
stop(sprintf('Forbidden function%s name build: %s.', ifelse(length(blacklist.found) == 1, '\'s', 's\''), paste0(blacklist[blacklist.found], collapse = ', ')))

return(res)

}

Large diffs are not rendered by default.

Oops, something went wrong.
@@ -0,0 +1,6 @@
#' \emph{sandboxR}: Filtering "malicious" calls in R
#'
#' @docType package
#' @importFrom parser parser
#' @name sandboxR
NULL
@@ -0,0 +1,93 @@
# sandboxR: *filtering "malicious" calls*

## Preface

This **POC** [R](http://www.r-project.org/) package tries to filter "malicious" calls in R expressions based on a blacklist to let shared R instances **be safe from file and system calls**.

*If you are not the kind of person who likes to read much in the morning about a $n+1^{th}$ R package's theory and background, then please strike out for [testdriving the package in a browser](http://ec2-50-19-185-157.compute-1.amazonaws.com/) and **try to hack my system** with some guidance (see below)!*

Please note that I am aware of [Apparmor](http://wiki.apparmor.net/index.php/Main_Page), [SELinux](http://selinuxproject.org/page/Main_Page), [Tomoyo Linux](http://tomoyo.sourceforge.jp/index.html.en) and other Mandatory Access Control based filters **and** this package does not intend to be used instead of those implementations!

But there are some situations when a MAC based, kernel-level (mostly path based) filter cannot secure a system from a point of view. Just think of logs and other commonly writable files, not to mention the executable/memory mappable libraries. For example you might create a web application with the really great tool of @Jeff ([RApache](http://rapache.net/)) or @Jeroen's similarly handy [Opencpu](http://opencpu.org/) and would leave the `tempdir` system-wide writable to store generated images, uploaded files etc.

### Questions, motivations behind this package

Is it a good practice to set some MAC based filter not to allow users to reach other files on the server besides e.g. `/tmp`? Would not the users mess up each others files on purpose or by chance?

Are you sure some executable files in `lib` would not harm your system somehow?

How do you know what kind of diabolic actions could happen to your server by installing some random package from Github with the help of `devtools` by some of your users? Of course MAC filter would stop all (most) of the tries, but just imagine if someone would package some nice root exploit :)

Well, this latter is rather sci-fi, but the above questions do stand in some situations. This package is and idea for those, who are interested in such environments.

## Guidelines

The main idea for this little package was to behave as a wrapper in **web applications** - where file and system calls are not needed based on the followings:

* images generated in some R code are saved to disk usually by some internal ways (users should not try to issue some `pdf` or `png` calls on the disk),
* datasets are usually uploaded by some internal ways after some checks (users should never try to run e.g. `read.table` agains a file on the disk or a remote URL),
* users should never touch filesystem outside of their little world (which is mostly manageable with Apparmor - but with limitations, e.g.: you cannot secure the `tempdir` or any other common directory, or even your logs!),
* and users should not deal with R environments as the web applications would prepare and set all those for them,
* users would not use R internal and deprecated functions.

Besides these I kept the following guidelines in my head too to make an even **stricter sandbox-like environment**:

* users should not use the web application for testing/development purposes (by-by `debug`, most of `utils` and `methods`, profiling etc.),
* users might create some small functions in their files but would not deal with namespaces, `.Fortran` calls etc. If someone needs some more complex functions and methods, it should end up in a package hopefully on Github or even CRAN :)
* users should not call R packages directly, the server would load all required/available packages on startup,
* and no need for user enabled character encoding functions - as a web application would store everything in the same encoding,
* users would not want to run spell check and other strange stuff from R on the server,
* and of course no interactive terminal is supposed.

Based on these I compiled a quite long list of functions that should be **blacklisted**.

The blacklisted functions are checked in the passed R sources:

* if they are called (e.g.: `system('cat /etc/passwd')`),
* if those are attempted to be forked (e.g.: `foo <- system`),
* if those could be found in any character vector to later evaluation (e.g.: `foo <- "system('cat /etc/passwd')"`),
* if those could be found in any character vector build dynamically (e.g.: `foo <- paste("","y", "tem", sep="s")`).

## Apologetics

*Bear in mind that this package is still in development and is not (**might not ever will be**) ready for production!*

As being a *pre-alpha* release you would find too much restrictions in this approach ATM, as for example the following functions are also blacklisted (for simplicity - **later will be enabled** for sure):

* get, mget
* assign
* attach, detach

Also as I am not sure in this package's success, only base packages (`base`, `utils`, `methods`, `stats`, `graphics` and `grDevices`) are addressed.

## Testdrive!

Anyway, please feel free to **try** *and* **test** a [live (simple) web application which was build to test *sandboxR*](http://ec2-50-19-185-157.compute-1.amazonaws.com/)!

There I would **ask you to your best at trying to hack the server**, like:

* reading the system-wide readable `/sandbox/secret` file from R,
* try to write something in the system-wide writable `/sandbox/hello` file from R,
* or simply try to figure out the root password on the machine :)

Please do send me feedback if you'd succeed or you are tired of the too sharp restrictions!

## Frequently asked questions

Please see in dedicated file (FAQ.md).

## License

In short: this pseudo-package is licensed under **AGPL**.

More about this (and if I would misinterpret AGPL than this applies): please feel free to copy, use or modify/extend the sources for any open-sourced project. **But**: nor the sources, nor my simple ideas expressed on this site are allowed to use without my permission in any application which does not let users download its sources :)

## Special thanks

I would like to express my gratitude towards:

* Aleksandar Blagotić (@aL3xa) for working together
* Jeroen Ooms (@jeroenooms) for security related discussions, for his hints and for his unbelief :)
* [@DWin and Hadley Wickham (@hadley)](http://stackoverflow.com/questions/8379570/get-functions-title-from-documentation) for teaching me how to parse helpfiles
* my wife and the smartest little guy in the world (@Botond) for their tolerance and support
* and for a handful flu which got me some "spare" time to implement this.
@@ -0,0 +1,19 @@
context('filtering blacklisted functions')

test_that('called functions', {
expect_error(sandbox('system("cat /etc/passwd")'))
expect_error(sandbox('get(paste("","y", "tem", sep="s"))("whoami")'))
})

test_that('paste/sprintf created functions', {
expect_error(sandbox(c("x1 <- 's'", "x2 <- 'y'", "x3 <- 't'", "x4 <- 'e'", "x5 <- 'm'", "x <- paste(x1, x2, x1, x3, x4, x5, sep = '')", "lm(sprintf(\"%s('echo hello > /tmp/xxx') ~ 1\", x))")))
expect_error(sandbox('paste("as.numeric(system(\'ls -la | wc -l\', intern=T)) ~ 1")'))
})

test_that('paste/sprintf created functions', {
expect_error(sandbox(c("x <- system", "x('ls')")))
})

test_that('lm', {
expect_error(sandbox('lm("as.numeric(system(\'ls -la | wc -l\', intern=T)) ~ 1")'))
})
@@ -0,0 +1,27 @@
\name{commands.blacklist}
\alias{commands.blacklist}
\title{Blacklisted functions}
\usage{
commands.blacklist(pkg)
}
\arguments{
\item{pkg}{package name(s) where to look for blacklisted
functions. All packages' functions will be returned in a
list if not set.}
}
\value{
vector or list of function names
}
\description{
Blacklisted functions
}
\note{
Only base is added ATM.
}
\examples{
\dontrun{
commands.blacklist()
commands.blacklist('base')
}
}
@@ -0,0 +1,28 @@
\name{paste.masked}
\alias{paste.masked}
\title{Masked paste}
\usage{
paste.masked(..., sep = "", collapse = NULL)
}
\arguments{
\item{...}{see \code{paste}}

\item{sep}{see \code{paste}}

\item{collapse}{see \code{paste}}
}
\description{
Checks for forbidden function calls in constructed
character vector before returning result.
}
\examples{
\dontrun{
paste.masked('sys', 'tem', '(', sep = '')
paste.masked('xsx sax s system( dasf asf as url(')
paste.masked(c(letters[c(19, 25, 19, 20, 5, 13)], '('), collapse = "")
}
}
\seealso{
commands.blacklist
}

@@ -0,0 +1,20 @@
\name{paste0.masked}
\alias{paste0.masked}
\title{Masked paste0}
\usage{
paste0.masked(..., collapse = NULL)
}
\arguments{
\item{...}{see \code{paste0}}

\item{collapse}{see \code{paste0}}
}
\description{
Masked paste0
}
\examples{
\dontrun{
paste0.masked('sys', 'tem', '(')
}
}

@@ -0,0 +1,23 @@
\name{sandbox}
\alias{sandbox}
\title{Eval in sandbox}
\usage{
sandbox(src)
}
\arguments{
\item{src}{character vector of R commands}
}
\description{
Eval in sandbox
}
\examples{
\dontrun{
sandbox('paste(rev(c(")", "whatever", "(", "m", "e", "t", "s", "y", "s")), sep = "", collapse = "")')
sandbox('get(paste("","y", "tem", sep="s"))("whoami")')
sandbox(c("x1 <- 's'", "x2 <- 'y'", "x3 <- 't'", "x4 <- 'e'", "x5 <- 'm'", "x <- paste(x1, x2, x1, x3, x4, x5, sep = '')", "lm(sprintf(\\"\%s('echo hello > /tmp/xxx') ~ 1\\", x))"))
sandbox('paste("as.numeric(system(\\'ls -la | wc -l\\', intern=T)) ~ 1")')
sandbox(c("x <- system", "x('ls')"))
sandbox('lm("as.numeric(system(\\'ls -la | wc -l\\', intern=T)) ~ 1")')
}
}

@@ -0,0 +1,9 @@
\docType{package}
\name{sandboxR}
\alias{sandboxR}
\alias{sandboxR-package}
\title{\emph{sandboxR}: Filtering "malicious" calls in R}
\description{
\emph{sandboxR}: Filtering "malicious" calls in R
}

@@ -0,0 +1,15 @@
\name{sprintf.masked}
\alias{sprintf.masked}
\title{Masked sprintf}
\usage{
sprintf.masked(fmt, ...)
}
\arguments{
\item{fmt}{see \code{sprintf}}

\item{...}{see \code{sprintf}}
}
\description{
Masked sprintf
}

Oops, something went wrong.

0 comments on commit e8d7f5d

Please sign in to comment.
You can’t perform that action at this time.