-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to prevent duplicate files ? #1412
Comments
md5 is not fully implemented. Files uploaded through s3 has md5. I will leave this issue open until md5 for filer is implemented. |
Will this be only implemented in filer ? |
I mean, will this be available only while accessing files via filer or
through direct API as well ?
What’s the difference? Everything is through API.
|
Filer has its own database for storing metadata, right ? |
md5 is stored with filer db. any preference? |
No, it is alright. I am currently not using filer but will start using thereafter for md5. |
Great, will try shortly. I have one side question, is it possible to retrieve file metadata without filer ? |
you can check files in volume servers by http head requests. |
I am confused about how to use it to prevent duplicates. For example, I have incoming upload and I want to prevent same file from storing again. How do I know on which path I have to make HEAD request beforehand ? Is there any API through which I can lookup / search file by MD5 hash ? |
No. You would need to maintain a MD5 => file mapping yourself. |
Alright, then I will upload using FUSE and create md5 hash by reading file locally myself. Is there any way to access metadata (md5 hash created by seaweed) while using FUSE based local fs ? If yes, I will not have to read file in memory and generate md5 myself. |
files written by FUSE do not have md5 because writes can happen randomly anywhere in the file. It is not efficient to always re-calculate the md5 for any updates. |
请教一下,看了半天,意思是去重方案要自己维护文件摘要和文件之间的关系吗,能不能做个api,根据md5,sha之类的信息查询文件是否存在,毕竟类似秒传之类的需求还是很常见的 |
the best thing is the seaweedfs can handle all the duplication itself, however, the file structure kept in the original place. but the only use the one content |
How to prevent duplicate files ?
Does seaweedfs store md5 hash of the file which can be later used to prevent duplicate while uploading ?
The text was updated successfully, but these errors were encountered: