Skip to content

Latest commit

 

History

History
208 lines (141 loc) · 9.55 KB

usage.md

File metadata and controls

208 lines (141 loc) · 9.55 KB

mongofile:usage:/how/to/use/this.lib

======= This library use a different approach to accessing files and is more Java centric than the GridFS implementation is in the mongo-java-driver. Rather than having to cast the file objects, I made the decision to prefer static compile time type-checking and simplified object APIs over what is currently available from GridFS(2.11).

The reading, and writing functions are separated from the main store code to keep usage more simple for the most common read and writes.

Gzip compression is automatically detected based on media type of the file unless turned off in the configuration. Most media types will be stored compressed to save bytes ( even whole chunks ) in the store. This is handled transparently and the user does not need to know when compression is involved.

Optional encryption is also supported by providing the Encryption class to perform the encryption and decryption of file chunks while storing and reading the file. The use of encryption will automatically detected from the storage format store with the file, however the correct decryption algorithm is required to be provided with the configuration in order to succesfully.

Also TTL file expiration ( deletion ) will allow for usage as a cache to store objects that have a finite lifespan.

URL

A mongofile URL protocol has also been implemented to allow a single string to to represent all the info need to fetch the file back as well as give some metadata to the user wihtout having to query the store for basic metadata. Several examples look like this

mongofile:/home/me/foo/filename.zip?52fb1e7b36707d6d13ebfda9#application/zip
mongofile:gz:/mypath/fileName.pdf?52fb1e7b36707d6d13ebfda9#application/pdf
mongofile:enc:/mypath/fileName2.pdf?52fb1e7b36707d6d13ebfda9#application/pdf

For the second example above, the parts of the URL represent :

Part Meaning
mongofile the protocol
gz compression was used to save space storing the file
/mypath/fileName.pdf the file path and name
52fb1e7b36707d6d13ebfda9 a UUID id generated by the MongoDB driver for this object
application/pdf the media type for the data

The API

The API for this library has changed a bit from the 0.7.x version to allow for migration to the new objects that are present in the upcoming Mongo-Java-Driver 3.0.x. It will require some minro refactoring of your existing code that should not be too painful.

This library now uses surrogate objects to conform it to the same objects used by the 3.0.x driver so that when the 3.0.x driver is released, this library will be ready to work with it. I plan for this library to go 1.0.x shortly after the 3.0.x driver is released.

Standing up a MongoFileStore

MongoClient and the database

Configure the connection to the MongoDB server and database in whatever fashion is available to you. Consult the MongoClient class from the driver for more Info.

The database object below is a normal com.mongodb.DB or org.mongodb.MongoDatabase object.

Configuration

MongoFileStoreConfig config = MongoFileStoreConfig.builder()// start builder
        .bucket(bucket) // <bucket>.files and <bucket>.chunks
        .chunkSize(ChunkSize.medium_256K) // default
        .enableCompression(true) //  default
        .enableEncryption(null) // provide an encryption/decryption instance 
        .writeConcern(WriteConcern.SAFE) // default
        .readPreference(ReadPreference.primary()) // default
        .build(); // generate the configuration

MongoFileStore store = new MongoFileStore(database, config);

Keep the store handy, it is the core of all operations with file stored in MongoFS. If you are using Spring, use Java or XML configuration to make it a singleton Spring bean and inject this object where you need to access files.

NOTE : Compression and encryption are allowed to be used at the same time and you can choose one, the other, or both on a per file store basis.

NOTE : IF both are enabled, compression will be applied first, then the compressed bytes will be encrypted9 before chunking and saving.

GridFS compatable configuration

MongoFileStoreConfig config = MongoFileStoreConfig.builder().gridFSCompatible('test');

MongoFileStore store = new MongoFileStore(database, config);

File expiration

A file expriation timestamp can be placed on the file if storage in the file store is temporary. This expiration utilizes the MongoDB TTL index feature so the removal of file is dependant on the server side background process being enabled on the server. Expired files still in the file store will still be served up if requested before the server process can remove the files and chunks.

// When a file is created 
Date expiresAt = TimeMachine.from(0).forward(3).days().inTime();

MongoFileWriter writer = store.createNew(filename, mediaType, expiresAt, true);
writer.write(new ByteArrayInputStream(LoremIpsum.LOREM_IPSUM.getBytes()));

// after a file is created
MongoFile first = store.find("/foo/bar.txt").next();
store.expireFile(first,  expiresAt);

NOTE : A simple Date manipulation DSL exists in the library called TimeMachine. It has its limits but should be sufficient for most needs. You can also combine different units of measure ( days, hours, minutes, ...) to come up with some unique situations. For example :

TimeMachine.now().backward(2).days().forward(5).hours().backward(10).minutes().inTime();

File I/O

##Writing files into the store

MongoFileWriter writer = store.createNew("README.md", "text/plain", true);
writer.write(new ByteArrayInputStream(LOREM_IPSUM.getBytes()));
MongoFile file = writer.getMongoFile();
URL url = file.getURL();

System.out.println(url);

would print the following to stdout

mongofile:gz:README.md?52fb1e7b36707d6d13ebfda9#text/plain

where the parts represent :

Part Meaning
mongofile the protocol
gz compression was used to save space storing the file
README.md the file path and name
52fb1e7b36707d6d13ebfda9 a UUID id generated by the MongoDB driver for this document
text/plain the media type for the data

Store the url string how you like and use it to fetch the file back from the store when its needed.

REMEMBER : Filenames are NOT unique in the system, you can have many files with the same path and file name.

Zip Archive Expansion

Support is available to have a zip archive automatically expanded into individual files during the upload process and return a Manifest with a list of all the files in the zip archive.

MongoFileWriter writer = store.createNew(filename, "application/pdf");

MongoManifest manifest = writer.uploadZipFile(new FileInputStream(filename));

assertEquals(filename, manifest.getZip().getFilename());
assertEquals(3, manifest.getFiles().size());

MongoFile file1 = manifest.getFiles().get(0);
assertEquals("file1.txt", file1.getFilename());

Both compression and encryption will be applied if configured and applicable for each file read from the zip archive. No data will be stored on the archvie file itself so the data is only stored once.

Expanding the zip archive will results in unique files being added to the file store and currently, there is no support to read the original archive back out of the store. You can obtain a manifest from the original file as follows:

MongoManifest manifest2 = store.getManifest(mongoFileUrl);

assertTrue(manifest2.getZip().isExpandedZipFile());
assertEquals(3, manifest2.getFiles().size());

MongoFile file1 = manifest2.getFiles().get(0);

Each file can then be read from the file store individually.

Finding and reading files

Using a stored URL string

MongoFileUrl url = MongoFileUrl.construct("mongofile:gz:README.md?52fb1e7b36707d6d13ebfda9#text/plain");
MongoFile mongoFile = store.findOne(url); // lookup the file by its url
  
ByteArrayOutputStream out = new ByteArrayOutputStream(32 * 1024);
mongoFile.readInto(ByteArrayOutputStream, true); // true == flush output stream when done

String fileText = out.toString();       

You can still read files by file name from the store as well.

ByteArrayOutputStream out = new ByteArrayOutputStream(32 * 1024);

MongoFileCursor cursor = store.find("/foo/bar1.txt");
int count = 0;
for (MongoFile mongoFile : cursor) {
    ++count;
    assertNotNull(mongoFile.getURL());
    assertEquals("/foo/bar1.txt", mongoFile.getFilename());
    InputStream in = new MongoFileReader(store, mongoFile).getInputStream();
    new BytesCopier(in, out).transfer(true); // append more than one file together
}
assertEquals(2, count);
assertEquals(LoremIpsum.LOREM_IPSUM.length() * 2, out.toString().length());

Removing files from the store

String storedSomewhereElse = "mongofile:gz:README.md?52fb1e7b36707d6d13ebfda9#text/plain";

MongoFileUrl url = MongoFileUrl.construct(storedSomewhereElse);

boolean async = true;
store.remove(url, async); // mark the file for deletion

You can asynchronously delete files if the store is configured for that or express it explicilty as in the code above. This feature utilizes the TTL index feature on MongoDb collecitons to delete the files, so it may take 60 seconds or longer to actually delete the file.

NOTE : Evne if a file has been deleted but not removed by the server, the MongoFS will not return the file object to the use so it is effectively treated as if it was already removed from the repository.

##Where to find more Check out the integration tests in the source code, there are many examples of how to do things with mongoFS. I am trying to all of the features in the library with an integration test.