Skip to content

Wrapper for sh to handle docker volume mappings

License

Notifications You must be signed in to change notification settings

chbrandt/shoosh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

shoosh

Execute commands on Docker container while mapping volume paths.

Portability and abstraction are the two main components guiding the implementation of the library. The idea is to run a command inside the container as if it was installed in the host system (locally), and also be able to use the very same (command) calls either from the host as well as inside the container.

How it works?

  • Map container and its volumes (host:container);
  • Wrap a command in a sh;
  • Execute the command while mapping the volumes.

How to use

1: Instantiate a container

First step is to instantiate the (Docker) container we are going to use next. (You'll use whatever method/interface you're used to do it; the lib is not meant to manage the containers lifecycle.)

For example, let's run a simple alpine container mapping local test/ directory to /some/path inside the container:

$ docker run -dt \
        --name the_container \
        -v /tmp/host/path:/some/container/path \
        debian

2: Create a handle to container

Then, we create a handle to the container whit the volume(s) set:

>>> import shoosh
>>> sh = shoosh.init('the_container')

3: Wrap a command

Create a "wrap" around a command in the container:

>>> echo = sh.wrap('echo')

4: Run the command, map volumes

When we run the command, we specify the path as if the command would run locally, on the host; Shoosh maps the paths accordingly to abstract the container.

>>> echo("Map path:", '/tmp/host/path')
Map path: /some/container/path

Examples

TBD

Rationale

The typical use of (Docker) containers is to spin-up a container containing the software packages we need to accomplish a given task, process the data we aim to, and eventually get the results back to the host system.

When we run (Docker) containers, though, the filesystem structure inside the container is completely detached from that of the host system. The way to exchange data files between host and container is through container mounting points -- or volumes.

In practice, it means that the paths we see inside a container -- where our data is sitting -- is different than that of the host system -- where the very same data files are stored, and being shared through volumes to the container.

Consider the following scenario:

  • For some reason we use a [osgeo/gdal] container to process geographical data;
  • The data in our host system is stored under /home/user/data/some_project;
  • When we instantiate the [osgeo/gdal] image as some_gdal container, we mount-bind that path to container's /data/geo;
  • Suppose all we want to do it to [convert our data files from GeoTiff to Cloud Optimized GeoTiff(https://gdal.org/drivers/raster/cog.html):
$ gdal_translate /data/geo/raster.tif /data/geo/raster_cog.tif -of COG

Everything works very well if the interaction with the container is decoupled from the host system, i.e. if we are executing the commands inside the container. But if there is a need for requesting the execution of a command from the host system inside the container, one should be aware of the different paths resulting from a host-container volumes binding.

The demand for requesting processing tasks (from the host system) to execute inside a container have different motivations, clearly; It can be a personal demand of having the host system software set clean and the convenience of not getting inside the container every time a simple, atomic task has to be performed. But it can also be part of a more complex software application running on the host system -- like a data processing pipeline -- where different components demand different third-party software tools (available through containers).

Motivation

In my case, the reason I wrote this software, is I use Python to transform planetary (Mars) geo-located data in different ways using either OSGEO's GDAL as well as USGS's ISIS software packages in a couple of cloud computing projects (GMAP, NEANIAS). I had to find a way, for my own convenience, to merge data visualization, metadata selection, download, cleaning, map-projection, formatting, etc., from the host system and on containers seamlessly through Python.

Application (value)

This (Python) library primary feature is the execution of Shell commands inside (Docker) containers and the correct re-mapping of paths on the fly. It does also provides a seamless interface to execute Shell commands in the current environment, re-mapping of paths is not necessary in this case but it allow you to move your code from a "host-container" scenario to "container-only" -- in a cloud infrastructure -- seamlessly.

All this seamlessly feature is possible because of the sh package.

About

Wrapper for sh to handle docker volume mappings

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages