FileStore

Patrick Hochstenbach edited this page Jun 22, 2017 · 19 revisions

Stores are Catmandu packages to store Catmandu Items in a database. A FileStore is a Store where you can store binary content (unstructured data). Out of the box, one FileStore implementation is provided: File::Simple which stores files in a directory structure on the local file system.

The command below stores the /tmp/myfile.txt in the File::Simple FileStore in the "container" 1234 with the file identifier myfile.txt:

$ catmandu stream /tmp/myfile.txt to File::Simple --root t/data --bag 1234 --id myfile.txt

The root parameter is mandatory for the File::Simple FileStore. It defines the location where all stored files are written. The other two parameters bag and id are mandatory for every FileStore (see below).

To extract a file from a FileStore the stream command can be used in the opposite direction:

$ catmandu stream File::Simple --root t/data --bag 1234 --id myfile.txt to /tmp/myfile.txt

From the File::Simple the file myfile.txt is extracted from the container with identifier 1234.

Every FileStore inherits the functionality of a Store. In this way the drop and delete commands can be used to delete data from a FileStore:

# Delete a "file"
$ catmandu delete File::Simple --root t/data --bag 1234 --id myfile.txt

# Delete a "folder"
$ catmandu drop File::Simple --root t/data --bag 1234

Bag

A FileStore contains one or more Bags. These Bags are containers (or "folders") to store zero or more files. The name of these container, indicated with the bag option in the Catmandu commands, is an identifier. In the case of the File::Simple this identifier needs to be a number, or when setting the uuid option a UUID identifier.

The binary data (files) stored in these Bags also needs an identifier, indicated with the id option. Usually the file name is a good choice to use.

Both the bag name option and id options are required when uploading or streaming data from a FileStore.

Within a FileStore Bag there is no deeper hierarchy possible. A Bag contains a flat list of files. To store deeply nested folders and files, mechanisms such as ZIP files need to be created and imported.

$ zip -r /tmp/files.zip /mnt/data/files
$ catmandu stream /tmp/files.zip --root t/data --bag 1234 --id files.zip

Index

Every FileStore has a default Bag called index which contains a list of all available Bags in the store (like the listing of all folders). Using the export command a listing of bags can be requested from the FileStore:

$ catmandu export File::Simple --root t/data to YAML

To retrieve a listing of all files stored in a bag the bag option needs to be provided:

$ catmandu export File::Simple --root t/data --bag 1234 to YAML

Technical Metadata

Each Bag ("container") in a FileStore contains at least the _id as metadata. Some FileStores may contain more metadata. To retrieve a listing of all containers use the export command on the FileStore:

$ catmandu export File::Simple --root t/data 
[{"_id":"1234"},{"_id":"1235"},{"_id":"1236"}]

Every "file" in a FileStore contains at least the following fields:

  • _id : the name of the file
  • _stream : a callback function to download the contents of the file (pass it an IO::Handle)
  • created : the creation date time of the file as a UNIX timestamp
  • modified : the last modification date time of the file as a UNIX timestamp
  • content_type : the content type of the file
  • size : the file size in bytes
  • md5 : an MD5 checksum if the FileStore support is, or an empty string

NOTE: Not every exporter can serialise the code reference in the stream field. For instance, when exporting to JSON this error message will be show up:

$ catmandu export File::Simple --root t/data --bag 1234
Oops! encountered CODE(0x7f99685f4390), but JSON can only represent references to arrays or hashes at /Users/hochsten/.plenv/versions/5.24.0/lib/perl5/site_perl/5.24.0/Catmandu/Exporter/JSON.pm line 36.

This field can be ignored from the output using the remove_field fix:

$ catmandu export File::Simple --root t/data --bag 1234 --fix 'remove_field(_stream)'
[{"_id":"files.pdf","content_type":"application/pdf","modified":1498122646,"md5":"","size":883202,"created":1498122646}]

Always use the stream command in Catmandu to extract files from a FileStore:

$ catmandu stream File::Simple --root t/data --bag 1234 --id 'files.pdf' > output.pdf

Configuration

As for Stores, the configuration parameters for FileStore can be written in a catmandu.yml configuration file. In this way the Catmandu commands can be shortened:

$ cat catmandu.yml
---
store:
  files
    package: File::Simple
    options:
        root: t/data

# Get a "directory" listing
$ catmandu export files to YAML

# Get a "file" listing
$ catmandu export files --bag 1234 to YAML

# Add a file
$ catmandu stream /tmp/myfile.txt to files --bag 1234 --id myfile.txt

# Download a file
$ catmandu stream files --bag 1234 --id myfile.txt to /tmp/myfile.txt

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.