-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to detect if a file was actually saved? #1
Comments
Currently, you'd have to pre-check whether the file exists before calling obj = # path or readable object
stream = hashfs.hashfs.Stream(obj) # if you need an iterable wrapper for object
file_id = fs.computehash(stream)
if fs.exists(file_id):
# Handle duplicate file
filepath = fs.idpath(file_id)
address = hashfs.HashAddress(file_id, fs.relpath(filepath), filepath)
else:
# Handle unique file
address = fs.put(stream) This is somewhat sub-optimal since the file hash is calculated twice if file_id = fs.computehash(stream)
filepath = fs.idpath(file_id)
address = hashfs.HashAddress(file_id, fs.relpath(filepath), filepath)
if fs.exists(file_id):
# Handle duplicate file
else:
# Handle unique file
fs.copy(stream, file_id) However, neither of these is ideal since you have to utilize more of the internal implementation of |
Hi Derrick, thanks for your response. Could it be possibile to return a tuple from the copy() method? It could return the id and a boolean indicating if the file was actually saved or not. Or if you don't want to modify the copy() method you could add another method (something like verify_copy() ) that does the check: def verify_copy(self, stream, id, extension=None):
"""Copy the contents of `stream` onto disk with an optional file
extension appended. The copy process uses a temporary file to store the
initial contents and then moves this file to it's final location.
"""
filepath = self.idpath(id, extension)
# Only copy file if it doesn't already exist.
if not os.path.isfile(filepath):
with tmpfile(stream, self.fmode) as fname:
self.makepath(os.path.dirname(filepath))
shutil.copy(fname, filepath)
return filepath, True
else:
return filepath, False |
Are you using Changing the return of Another option would be to modify Seems like the options are:
|
After some more thought, I am leaning towards |
#3 looks the right solution to me because it should not break the interface contract. I will change my code to use the put() method instead of copy. May I suggest you to change the internal methods (those that are not supposed to be called directly by external code) to _method()? This the the standard way to indicate that a method is not public. Thanks again for your HashFS project. |
…urn HashAddress with "is_duplicate=True" when file with same hash already exists on disk. Refs #1.
This has been released as v0.5.0. |
BTW, the new attribute for address = fs.put(stream)
if address.is_duplicate:
# Handle duplicate file condition |
Thanks. It works. Giovanni |
Hi,
how can I detect if a file was actually saved with the copy() method? I mean, if two files with different names have the same content the hash is the same, so the first file is actually saved, while the second gets the same path of the first, but the save operation is not performed. In either cases the copy() method returns the address id of the file. How can I detect this? This information is helpful for to know the ratio of duplicate files.
Thanks,
Giovanni
The text was updated successfully, but these errors were encountered: