New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mode 1448 - Store binary files in a relational database #440
Conversation
-Split metadata and payload -Split payload into chunks -Added configurable idle time
-Separated content into chunks -Added expire time (it should reduce memory usage by offloading from memory idle content)Disatventage is that it still affected by large content:
The most safe way is to implement this store using plain JDBC and mutable chunks. Metadata can be cached. The most safe solution for large content will be |
Plain JDBC version ready |
*/ | ||
private static String blob(Connection connection, int size) throws BinaryStoreException { | ||
try { | ||
String name = connection.getMetaData().getDatabaseProductName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this method is not called too much (only during initialization), but might want to lowercase the product name to prevent repeatedly calling toLowerCase().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Overall, I really like this new approach. Very nice work, Oleg. As we discussed, this proof of concept is probably ready to be polished, including your suggestion of moving many of the utility methods and potentially customizable methods into a Great work! |
I think I found another minor problem with content storage in latest impl. Caused by key, it is unique for each invocation irrespective of content and thus same content can be stored more then once. I assume we need to compute hash after content store into intermediate table and check the for existance before "actual" insert. |
I think I understand the issue: we don't know the SHA-1 of the content until we've read it completely, and until we read it we can't know whether the value has been stored. I can think of several approaches:
The FileSystemBinaryStore writes to a temporary file, and once it knows the SHA-1 looks for an existing file. If a binary value doesn't exist for the SHA-1, it moves the temporary file to it's storage area (and thus doesn't have to write it twice). The interesting thing is that the likelihood of an application storing the same file multiple times varies, and in most cases is probably pretty low. On one hand, that means that most of the time it might be okay to always insert the value into the database (option 1 above) since it will not likely clash. On the other, we do have to deal with clashes, and such a design might be more complicated. (For example, we can't make a unique constraint on the SHA-1 column if we want to put NULL into that column, so we'd have to play some tricks with the schema or accept non-unique constraints. Plus, we'd want the schema to be such that the cleanup command is pretty simple; e.g., BTW, our Does that help at all? |
(Closed accidentally, so had to reopen. Sorry.) Thinking aloud ... I just thought of a slight variation to the second approach in my previous comment. What if the DatabaseBinaryStore contained a FileSystemBinaryStore as a local cache, and wrote the binary to that cache first. The FileSystemBinary store would actually compute the key (and SHA-1)? And so that the FileSystemBinaryStore doesn't permanently store the file (we'd want it to be a cache, not a full copy of the values), the DatabaseBinaryStore might immediately mark the value in the FileSystemBinaryStore as unused (meaning the FileSystemBinaryStore would make it available for cleanup.) Plus, using it as a cache might mean that recently-read BinaryValues are locally available, saving trips to the database. This might not even be as easy has just having the DatabaseBinaryStore write out to a temporary file (kind of like the FileSystemBinaryStore does). |
Talking about likelihood to store duplicate values I have concern about So... My vote is for approach with FileSystemBinaryStore as local cache. 2012/7/20 Randall Hauch <
|
I didn't add any kind of unit test because dependecy from local database most probably will cause build failures.