File system BLOB store

Jens Reimann edited this page May 7, 2015 · 3 revisions

This page is about the file system BLOB store which will appear in Package Drone 0.6.0

The problem

By default Package Drone will store the content of artifacts (BLOBs) into the database. This has two advantages. First, the BLOB is part of the entity which makes up the artifact. So if the artifact is being deleted (or not created due to a rollback) then the BLOB goes with it. Second, you only need to back up the database, and you will have all the BLOBs with it.

Sadly both MySQL and PostgreSQL do have JDBC drivers which do not fully support streaming of binary data. So JDBC does provide an API for streaming a large binary object into the database, so that it is not required to keep the full BLOB in the main memory of the application. However, both JDBC drivers do have a few bugs which implement the API, but still handle the full BLOB in the memory of the application.

MySQL does support streaming on the API when storing the data, but when the packet to the server is serialized, it is first serialized in the main memory, before it is sent to the socket. This causes not the artifact to fill the main memory, but the packet buffer. Reading with MySQL is even worse, first the full packet is read and then converted to a BLOB. All inside the memory, which results in about 3 times the size the BLOB would take. So reading a 400MB file from the database consumes about 1.2GB of RAM.

PostgreSQL does a little bit better. Writing is fully streamed. But when reading the data, the server pushes the full BLOB to the client, without any sort of flow control. So the client simply gets overloaded with incoming data. So in the end, both versions just don't work reliably.

Note: Although we never tested it, this should work with Oracle. So maybe somebody is interested to implement this :wink:

The solution

The solution that Package Drone provides to this is a file system overlay backend. Meaning: by default it is possible to store everything in the database. But it is possible to activate a file system overlay, which will then redirect everything to a location in the file system. If the artifact data cannot be found in the file system, then the database is tried. This allows one to activate the file system layer at a later time.

Limitations

There are few limitations though. Once activated it is not possible, as of now, to simply deactivate it again. This would require importing all file system BLOBs into the database.

Also are the BLOBs inside the database not spooled out into the file system.

It is possible to export the channels using the export (or export all) functions of Package Drone. Exporting and re-importing will re-create all BLOBs, and therefore store the BLOBs in either the file system or the database. So it is possible to switch back to the database, or spool out the BLOBs to the file system, it simply is a a bit of manual work.

What can go wrong?

A lot, but that is always the case :wink:

So if the file system location disappears (somehow files get deleted) then the artifacts cannot be read anymore. Once they re-appear, you can access them normally again.

Package Drone will store two things in the database once the file system overlay is activated. A unique ID and the location. This way it is possible to open the file system store on startup and check if it is the right store (based on the ID). If it is not, then Package Drone will consider the BLOB files missing.

It is possible to re-locate the directory of the file system store using Package Drone. The directory must be manually moved. Afterwards Package Drone can be told to relocate as well. It will test if the new directory matches the ID and then changes the location inside the database.

This way also a relocation to a file location of a different Package Drone instance can be prevented.

Things to consider

Backing up you installation get a bit more tricky once the file system layer has been activated. Because beside the database it is now required to also back up the file system storage. The file system overlay is not a cache for quicker access, but a requirement since it will be the only location where the BLOBs are stored, once activated.

It is possible to export and import the channels, using the functions of Package Drone. However this also does not back up the full database, but only the channels (and not the user base, configuration or deploy keys).

How to activate it

Go to "System" -> "Storage" in the main menu. On the right side there is a box labeled "File system BLOB store". Enter the file system directory into the entry field and press the "Convert" button.

If the box is already labeled "Relocate file storage", then the BLOB store was already converted.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.