Skip to content

Files

Latest commit

 

History

History

sbox

Simple Sandbox

Problem

We can't reliably tell if code is working without executing it, but directly executing untrusted and quite-possibly-buggy code generated by a random LLM directly on our development machines is undesirable.

Solution

  • Network and Filesystem isolation provider by Docker
  • Timeout provider by a bash exec wrapper
  • Eval wrapper to capture errors, format outputs and always return a consitent result

Implementation

sandbox.py (intended to be used as a library)

  • extract_function_info(language, code) perform static analysis on potentially non-working code that implements a function. returns a { name, args[] } object containing the functions name and a list of its arguments.
  • FunctionSandbox(code, language) high-level Docker sandbox class. Use the .call(..) method to invoke the untrusted function.

timeout.sh

Bash implementation of the timeout layer

Dockerfile.javascript, Dockerfile.python

Docker implementations of the isolation layer

eval.javascript.tpl, eval.python.tpl

Eval wrappers.

Assumptions

  • code contains a single function in language
  • the language we are working with supports try/catch, lists and objects
  • the language we are working with can serialize JSON